这个需要调用requests模块(相当于c++的头文件)
import requests
还需要一个User-Agent头(这个意思就是告诉python用的什么系统和浏览器)
Google Chrome(Windows):
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36
Mozilla Firefox(Windows):
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.Microsoft Edge(Windows):
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.774.63 Safari/537.36 Edg/89.0.774.63
这仨是常用的,谷歌 火狐 Edge, 我这里使用的是edge
headers_list = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.774.63 Safari/537.36 Edg/89.0.774.63'}
找到网页后就可以扒内容了
with open('response.txt', 'w', encoding='utf-8') as file:
for headers in headers:
# 发送请求
response = requests.get(url, headers=headers)
# 打印状态码
print(f'Sent request with header: {headers["User-Agent"]}, Status code: {response.status_code}')
# 如果请求成功,保存返回内容
if response.status_code == 200:
file.write(f'Response with header: {headers["User-Agent"]}\n')
file.write(response.text )
else:
file.write(f'Failed request with header: {headers["User-Agent"]}, Status code: {response.status_code}')
print('请求成功!')
完整代码如下
import requests # 定义要访问的URL
url = 'http://baidu.com' # 请替换为你要访问的网站 # 定义User-Agent头
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36 Edg/126.0.0.0',
}
# 创建一个TXT文件来保存返回的内容
with open('response.txt', 'w', encoding='utf-8') as file: for headers in headers: # 发送请求 response = requests.get(url, headers=headers) # 打印状态码 print(f'Sent request with header: {headers["User-Agent"]}, Status code: {response.status_code}') # 如果请求成功,保存返回内容 if response.status_code == 200: file.write(f'Response with header: {headers["User-Agent"]}\n') file.write(response.text) else: file.write(f'Failed request with header: {headers["User-Agent"]}, Status code: {response.status_code}') print('请求成功!')
结果如下
文本如下