Requests是Python的非常常用的HTTP的库,主要用于网络爬虫和接口自动化测试。下面使用Requests最新版本,通过pip install requests安装。
pip install requestsCollecting requests Downloading requests-2.25.0-py2.py3-none-any.whl (61 kB) |████████████████████████████████| 61 kB 99 kB/s Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./venv/lib/python3.9/site-packages (from requests) (1.26.2)Collecting certifi>=2017.4.17 Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB) |████████████████████████████████| 147 kB 6.7 kB/s Collecting chardet<4,>=3.0.2 Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)Collecting idna<3,>=2.5 Downloading idna-2.10-py2.py3-none-any.whl (58 kB) |████████████████████████████████| 58 kB 5.2 kB/s Installing collected packages: idna, chardet, certifi, requestsSuccessfully installed certifi-2020.12.5 chardet-3.0.4 idna-2.10 requests-2.25.0
查看本地安装版本
>>> import requests>>> requests.__version__'2.25.0'
我们来编写一个请求百度首页
分为带参数请求和不带参数请求的
https://www.baidu.com/
https://www.baidu.com/s?wd=suv%E6%B1%BD%E8%BD%A6(suv汽车)
Requests实现Get请求2个例子
url = "https://www.baidu.com/s?wd=Springboot" payload = {} headers = { 'Connection': 'keep-alive', 'Accept': '*/*', 'is_xhr': '1', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36', 'is_pbs': '%E5%BE%AE%E4%BF%A1%E5%85%AC%E4%BC%97%E5%B9%B3%E5%8F%B0', 'Sec-Fetch-Site': 'same-origin', 'Sec-Fetch-Mode': 'cors', 'Sec-Fetch-Dest': 'empty', 'Referer': 'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd=%E5%BE%AE%E4%BF%A1%E5%85%AC%E4%BC%97%E5%B9%B3%E5%8F%B0&fenlei=256&oq=%25E5%25BE%25AE%25E4%25BF%25A1%25E5%2585%25AC%25E4%25BC%2597%25E5%25B9%25B3%25E5%258F%25B0&rsv_pq=dbe6f9780003d2c2&rsv_t=d1ddM3MDeEzN0o9%2BO4RzSnpDa8%2Bpu3avFyNpR4YZC3hodmvp3wBjm9N5k0s&rqlang=cn&rsv_enter=0&rsv_dl=tb&rsv_btype=t&inputT=5425&rsv_sug3=5&rsv_sug1=4&rsv_sug7=100&rsv_sug4=12430&rsv_sug=1', 'Accept-Language': 'en-US,en;q=0.9', 'Cookie': 'BIDUPSID=54C0826149B7299E360B557AB5A497A6; PSTM=1602209293; BAIDUID=54C0826149B7299ED5ED2B69EA7EAE01:FG=1; BD_UPN=12314753; MCITY=-%3A; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; sug=3; sugstore=1; ORIGIN=0; bdime=0; BDSFRCVID=DUCOJeC62ZuoAecrzNDIhH2qlmwUBOTTH6aotFZDEgCmWtkRMW_mEG0P8f8g0KAbGdi6ogKK3mOTHR8F_2uxOjjg8UtVJeC6EG0Ptf8g0f5; H_BDCLCKID_SF=tbCq_IKafCP3HJ84q465bPPJqxbXqM3P02OZ0l8KtfchDpjN-n50Xx0PM-AHb47LbaTG2MbmWIQHDPbDWxOpbUtD3pOWQfrUXKj4KKJxfnLWeIJoLt5nb-cBhUJiB5JLBan7bDnIXKohJh7FM4tW3J0ZyxomtfQxtNRJ0DnjtnLhbRO4-TFKjj5LjUK; BAIDUID_BFESS=54C0826149B7299ED5ED2B69EA7EAE01:FG=1; __yjsv5_shitong=1.0_7_98da4595c8bb2361eb889d3a64fd8fc8e3d5_300_1607497358701_112.5.168.233_fb665284; delPer=0; BD_CK_SAM=1; BD_HOME=1; PSINO=6; H_PS_PSSID=1466_33225_33058_33259_33236_33099_33101_26350_33199_33144_33148; H_PS_645EC=d1ddM3MDeEzN0o9%2BO4RzSnpDa8%2Bpu3avFyNpR4YZC3hodmvp3wBjm9N5k0s; BA_HECTOR=2k0k2l2l84al2k8llc1ft394d0q; COOKIE_SESSION=82578_0_8_4_2_2_0_0_8_2_1_0_0_0_0_0_1607423260_0_1607574668%7C9%2326670_117_1607417652%7C9; Hm_lvt_aec699bb6442ba076c8981c6dc490771=1606455659,1606789969,1607417005,1607574727; Hm_lpvt_aec699bb6442ba076c8981c6dc490771=1607574727; BDSVRTM=0; WWW_ST=1607574827950' } response = requests.request("GET", url, headers=headers, data=payload) print(response.headers)
请求的打印的Header
{'Bdpagetype': '3', 'Bdqid': '0x8e8822890000da2a', 'Cache-Control': 'private', 'Ckpacknum': '2', 'Ckrndstr': '90000da2a', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Date': 'Thu, 10 Dec 2020 05:39:06 GMT', 'Server': 'BWS/1.1', 'Set-Cookie': 'delPer=0; path=/; domain=.baidu.com, BD_CK_SAM=1;path=/, PSINO=6; domain=.baidu.com; path=/, BDSVRTM=19; path=/, H_PS_PSSID=1466_33225_33058_33259_33236_33099_33101_26350_33199_33144_33149; path=/; domain=.baidu.com', 'Strict-Transport-Security': 'max-age=172800', 'Traceid': '1607578746025443994610270496922024335914', 'Vary': 'Accept-Encoding', 'X-Ua-Compatible': 'IE=Edge,chrome=1', 'Transfer-Encoding': 'chunked'}
Requests 复杂的方法
复杂的请求方式通常带有请求头、代理IP、证书验证和Cookies等功能。Requests将这一系列复杂的请求做了简化,将这些功能在发送请求中以参数的形式传递并作用到请求中。1.添加请求头
请求头以字典的形式表示,然后在发送请求中设置headers参数。请求中设置请求头相当于把程序伪装成浏览器来向网站发送请求,主要设置User-Agent和Referer的内容,因为很多网站反爬虫都是根据这两个内容来判断当前请求是否合法。
url = "https://www.baidu.com/s?wd=Springboot" payload = {} headers = { 'Connection': 'keep-alive', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36', } response = requests.request("GET", url, headers=headers, data=payload) print(response.text)
2.使用代理IP
使用方法与请求头的使用方法一致,只需设置proxies参数即可。proxies以字典的形式表示,字典的key主要有http和https,这是两种不同的HTTP协议,字典的value是一个可访问的IP地址,免费的代理IP可以网上搜索,不过很多都是无法使用。代理IP的实现代码如下:
3.证书验证
网站中出现证书不合法的时候,只需设置verify=False,等于关闭证书验证。参数verify的默认值为True。如果需要设置证书文件,那么可将参数verify值设为证书所在的路径。
4.超时设置
发送请求后,由于网络、服务器等因素,从请求到响应会有一个时间差。如果不想程序等待时间过长或者延长等待时间,可以设定参数timeout的等待秒数,超过这个等待时间就会停止等待响应并引发一个异常。使用代码如下:
requests.get("https://www.baidu.com/", timeout=1)
requests.post("https://www.baidu.com/", timeout=1)
5.使用Cookies
在请求过程中使用Cookies也只需设置参数Cookies即可。Cookies的作用是标识用户身份,在Requests中以字典或RequestsCookieJar对象作为参数。获取方式主要从浏览器读取或通过程序运行产生。下面的例子进一步讲解如何使用Cookies。
print(r.utils.dict_from_cookiejar(response.cookies))