文章目录
- 一.简介
- 二.基本GET请求
- 1.最基本的GET请求--直接用get方法
- 2.添加 headers 和查询参数parmas
- 3.通过requests获取网络上图片的大小
- 三.基本POST请求
- 1.传入data数据
- 四.代理(proxies参数)
- 五.私密代理
- 六.web客户端验证
- 七.Cookies 和 Sission
- 1.Cookies
- 2.Session
- 八.处理HTTPS请求 SSL证书验证
一.简介
虽然Python的标准库中 urllib 模块已经包含了平常我们使用的大多数功能,但是它的 API 使用起来让人感觉不太好,而 Requests 自称 “HTTP for Humans”,说明使用更简洁方便。
requests 的底层实现其实就是 urllib,它继承了urllib的所有特性。Requests支持HTTP连接保持和连接池,支持使用cookie保持会话,支持文件上传,支持自动确定响应内容的编码,支持国际化的 URL 和 POST 数据自动编码。
利用 pip 安装 或者利用 easy_install 都可以完成安装:
$ pip install requests$ easy_install requests
二.基本GET请求
1.最基本的GET请求–直接用get方法
response.text和response.content的区别:
response.content:这个是直接从网络上面抓取的数据。没有经过任何解码。所以是一个bytes类型。其实在硬盘上和在网络上传输的字符串都是bytes类型。
response.text:这个是str的数据类型,是requests库将response.content进行解码的字符串。解码需要指定一个编码方式,requests会根据自己的猜测来判断编码的方式。所以有时候可能会猜测错误,就会导致解码产生乱码。这时候就应该使用response.content.decode(‘utf-8’)进行手动解码。
import requestsresponse = requests.get('http://www.baidu.com/')print('----------')
print(response.request.headers)
print('----------')
print(response.text)
print('----------')
print(response.content)
输出结果:
----------
{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
----------
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç™¾åº¦ä¸€ä¸‹ï¼Œä½ å°±çŸ¥é“</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ–°é—»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产å“</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å
³äºŽç™¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度å‰å¿
读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>æ„è§å馈</a> 京ICPè¯030173å· <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>----------
b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b\xef\xbc\x8c\xe4\xbd\xa0\xe5\xb0\xb1\xe7\x9f\xa5\xe9\x81\x93</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=\xe7\x99\xbe\xe5\xba\xa6\xe4\xb8\x80\xe4\xb8\x8b class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>\xe6\x96\xb0\xe9\x97\xbb</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>\xe5\x9c\xb0\xe5\x9b\xbe</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>\xe8\xa7\x86\xe9\xa2\x91</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>\xe8\xb4\xb4\xe5\x90\xa7</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>\xe7\x99\xbb\xe5\xbd\x95</a> </noscript> <script>document.write(\'<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ \'" name="tj_login" class="lb">\xe7\x99\xbb\xe5\xbd\x95</a>\');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">\xe6\x9b\xb4\xe5\xa4\x9a\xe4\xba\xa7\xe5\x93\x81</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>\xe5\x85\xb3\xe4\xba\x8e\xe7\x99\xbe\xe5\xba\xa6</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>\xe4\xbd\xbf\xe7\x94\xa8\xe7\x99\xbe\xe5\xba\xa6\xe5\x89\x8d\xe5\xbf\x85\xe8\xaf\xbb</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>\xe6\x84\x8f\xe8\xa7\x81\xe5\x8f\x8d\xe9\xa6\x88</a> \xe4\xba\xacICP\xe8\xaf\x81030173\xe5\x8f\xb7 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'
2.添加 headers 和查询参数parmas
如果想添加 headers,可以传入headers参数来增加请求头中的headers信息。如果要将参数放在url中传递,可以利用 params 参数。
import requestskw = {'wd':'长城'}headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}# params 接收一个字典或者字符串的查询参数,字典类型自动转换为url编码,不需要urlencode()
response = requests.get("http://www.baidu.com/s?", params = kw, headers = headers)print('-----1-----')
print(response.request.headers)# 查看响应内容,response.text 返回的是Unicode格式的数据
print('-----2-----')
print (response.text)# # 查看响应内容,response.content返回的字节流数据
print('-----3-----')
print (response.content)print('-----4-----')
print (response.content.decode())# 查看完整url地址
print('----------------------5----------------------------')
print (response.url)# 查看响应头部字符编码
print('-----6-----')
print (response.encoding)# 查看响应码
print('-----7-----')
print (response.status_code)
输出结果:
-----1-----
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
-----2-----
<!DOCTYPE html>
<html lang="zh-CN">
<head><meta charset="utf-8"><title>百度安å
¨éªŒè¯</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black"><meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0"><meta name="format-detection" content="telephone=no, email=no"><link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon"><link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"><link rel="stylesheet" href="https://wappass.bdimg.com/static/touch/css/api/mkdjump_8befa48.css" />
</head>
<body><div class="timeout hide"><div class="timeout-img"></div><div class="timeout-title">网络ä¸ç»™åŠ›ï¼Œè¯·ç¨åŽé‡è¯•</div><button type="button" class="timeout-button">返回首页</button></div><div class="timeout-feedback hide"><div class="timeout-feedback-icon"></div><p class="timeout-feedback-title">问题å馈</p></div><script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://wappass.bdimg.com/static/touch/js/mkdjump_6003cf3.js"></script>
</body>
</html>
-----3-----
b'<!DOCTYPE html>\n<html lang="zh-CN">\n<head>\n <meta charset="utf-8">\n <title>\xe7\x99\xbe\xe5\xba\xa6\xe5\xae\x89\xe5\x85\xa8\xe9\xaa\x8c\xe8\xaf\x81</title>\n <meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n <meta name="apple-mobile-web-app-capable" content="yes">\n <meta name="apple-mobile-web-app-status-bar-style" content="black">\n <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">\n <meta name="format-detection" content="telephone=no, email=no">\n <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">\n <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">\n <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">\n <link rel="stylesheet" href="https://wappass.bdimg.com/static/touch/css/api/mkdjump_8befa48.css" />\n</head>\n<body>\n <div class="timeout hide">\n <div class="timeout-img"></div>\n <div class="timeout-title">\xe7\xbd\x91\xe7\xbb\x9c\xe4\xb8\x8d\xe7\xbb\x99\xe5\x8a\x9b\xef\xbc\x8c\xe8\xaf\xb7\xe7\xa8\x8d\xe5\x90\x8e\xe9\x87\x8d\xe8\xaf\x95</div>\n <button type="button" class="timeout-button">\xe8\xbf\x94\xe5\x9b\x9e\xe9\xa6\x96\xe9\xa1\xb5</button>\n </div>\n <div class="timeout-feedback hide">\n <div class="timeout-feedback-icon"></div>\n <p class="timeout-feedback-title">\xe9\x97\xae\xe9\xa2\x98\xe5\x8f\x8d\xe9\xa6\x88</p>\n </div>\n\n<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>\n<script src="https://wappass.bdimg.com/static/touch/js/mkdjump_6003cf3.js"></script>\n</body>\n</html>'
-----4-----
<!DOCTYPE html>
<html lang="zh-CN">
<head><meta charset="utf-8"><title>百度安全验证</title><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black"><meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0"><meta name="format-detection" content="telephone=no, email=no"><link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon"><link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"><link rel="stylesheet" href="https://wappass.bdimg.com/static/touch/css/api/mkdjump_8befa48.css" />
</head>
<body><div class="timeout hide"><div class="timeout-img"></div><div class="timeout-title">网络不给力,请稍后重试</div><button type="button" class="timeout-button">返回首页</button></div><div class="timeout-feedback hide"><div class="timeout-feedback-icon"></div><p class="timeout-feedback-title">问题反馈</p></div><script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://wappass.bdimg.com/static/touch/js/mkdjump_6003cf3.js"></script>
</body>
</html>
----------------------5----------------------------
https://wappass.baidu.com/static/captcha/tuxing.html?&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%25E9%2595%25BF%25E5%259F%258E&logid=9931754313911722977&signature=e44e6266c70740aa0955def31c517835×tamp=1587817039
-----6-----
ISO-8859-1
-----7-----
200
3.通过requests获取网络上图片的大小
from io import BytesIO,StringIO
import requests
from PIL import Image
img_url = "http://imglf1.ph.126.net/pWRxzh6FRrG2qVL3JBvrDg==/6630172763234505196.png"
response = requests.get(img_url)
f = BytesIO(response.content)
img = Image.open(f)
print(img.size)
输出结果:
(500, 262)
理解一下 BytesIO 和StringIO
很多时候,数据读写不一定是文件,也可以在内存中读写。
StringIO顾名思义就是在内存中读写str。
BytesIO 就是在内存中读写bytes类型的二进制数据
例子中如果使用StringIO 即f = StringIO(response.text)会产生"cannot identify image file"的错误
当然上述例子也可以把图片存到本地之后再使用Image打开来获取图片大小
三.基本POST请求
1.传入data数据
对于 POST 请求来说,我们一般需要为它增加一些参数。那么最基本的传参方法可以利用 data 这个参数。
import requestsformdata = {"type":"AUTO","i":"i love python","doctype":"json","xmlVersion":"1.8","keyfrom":"fanyi.web","ue":"UTF-8","action":"FY_BY_ENTER","typoResult":"true"
}url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc&sessionFrom=null"headers={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}response = requests.post(url, data = formdata, headers = headers)print ('-------------1-----------')
print (response.text)# 如果是json文件可以直接显示
print ('-------------2----------')
print (response.json())
运行结果:
{"type":"EN2ZH_CN","errorCode":0,"elapsedTime":3,"translateResult":[[{"src":"i love python","tgt":"我喜欢python"}]],"smartResult":{"type":1,"entries":["","肆文","","","高德纳","",""]}}{'type': 'EN2ZH_CN', 'errorCode': 0, 'elapsedTime': 3, 'translateResult': [[{'src': 'i love python', 'tgt': '我喜欢python'}]], 'smartResult': {'type': 1, 'entries': ['', '肆文', '', '', '高德纳', '', '']}}
四.代理(proxies参数)
如果需要使用代理,你可以通过为任意请求方法提供 proxies 参数来配置单个请求:
import requests# 根据协议类型,选择不同的代理
proxies = {"http": "http://12.34.56.79:9527","https": "http://12.34.56.79:9527",
}response = requests.get("http://www.baidu.com", proxies = proxies)
print response.text
也可以通过本地环境变量 HTTP_PROXY 和 HTTPS_PROXY 来配置代理:
export HTTP_PROXY="http://12.34.56.79:9527"
export HTTPS_PROXY="https://12.34.56.79:9527"
五.私密代理
import requests# 如果代理需要使用HTTP Basic Auth,可以使用下面这种格式:
proxy = { "http": "mr_mao_hacker:sffqry9r@61.158.163.130:16816" }response = requests.get("http://www.baidu.com", proxies = proxy)print (response.text)
六.web客户端验证
如果是Web客户端验证,需要添加 auth = (账户名, 密码)
import requestsauth=('test', '123456')response = requests.get('http://192.168.199.107', auth = auth)print (response.text)
七.Cookies 和 Sission
1.Cookies
如果一个响应中包含了cookie,那么我们可以利用 cookies参数拿到:
import requestsresponse = requests.get("http://www.baidu.com/")# 7\. 返回CookieJar对象:
cookiejar = response.cookies# 8\. 将CookieJar转为字典:
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)print (cookiejar)print (cookiedict)
运行结果:
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>{'BDORZ': '27315'}
2.Session
在 requests 里,session对象是一个非常常用的对象,这个对象代表一次用户会话:从客户端浏览器连接服务器开始,到客户端浏览器与服务器断开。
会话能让我们在跨请求时候保持某些参数,比如在同一个 Session 实例发出的所有请求之间保持 cookie 。
实现人人网登录:
import requests# 1\. 创建session对象,可以保存Cookie值
ssion = requests.session()# 2\. 处理 headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}# 3\. 需要登录的用户名和密码
data = {"email":"mr_mao_hacker@163.com", "password":"alarmchime"} # 4\. 发送附带用户名和密码的请求,并获取登录后的Cookie值,保存在ssion里
ssion.post("http://www.renren.com/PLogin.do", data = data)# 5\. ssion包含用户登录后的Cookie值,可以直接访问那些登录后才可以访问的页面
response = ssion.get("http://www.renren.com/410043129/profile")# 6\. 打印响应内容
print (response.text)
八.处理HTTPS请求 SSL证书验证
Requests也可以为HTTPS请求验证SSL证书:
要想检查某个主机的SSL证书,你可以使用 verify 参数(也可以不写)
import requests
response = requests.get("https://www.baidu.com/", verify=True)# 也可以省略不写
# response = requests.get("https://www.baidu.com/")
print (r.text)
运行结果:
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type
content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible
content=IE=Edge>百度一下,你就知道 ....
如果SSL证书验证不通过,或者不信任服务器的安全证书,则会报出SSLError