起因(目的):
爬虫其实是很零碎的事情。 小工具, 就像是小螺丝一样, 有空整理一下工具箱。
过程:
- 自定义请求头,将 Accept-Language 设置为美国的英语。 尤其爬国外的网站, 不然会出现很奇怪的中文。
headers = {# 将 Accept-Language 设置为美国的英语# 'Accept-Language': 'en-US,en;q=0.9','Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8', # 默认的语言是中文。'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36'
}
- 打印本机 ip 和 代理 ip
import datetime
import requests# 开启全局代理之后,默认自动会使用代理。
def check_proxy_ip():url = 'https://httpbin.org/ip'today = datetime.datetime.now().strftime('%Y-%m-%d')response = requests.get(url, timeout=3)line = f"今天是: {today} 不使用代理, ip: {response.json()}"print(line)with open("ip_log.txt", "a", encoding="utf-8") as f:f.write(line)f.write("\n")proxies = {'http': 'http://127.0.0.1:10809','https': 'http://127.0.0.1:10809'}try:response2 = requests.get(url, proxies=proxies, timeout=3)print("使用代理: ", response2.json())except requests.exceptions.ReadTimeout as e:print(e)print("代理有问题。")print()if __name__ == '__main__':check_proxy_ip()
- requests 带参数的 post 方法
import requests# 1. 使用带参数的 POST
def send_msg(msg):msg = f"{msg}".encode("gbk")# receiver = "183***40985"receiver = "15******03"params = {"method": "sendSMS", "extenno": "22", "isLongSms": "0", "username": "15xxxx206", "password": "MTxxxxxyMDY=", "smstype": "0", "content": msg,"mobile": receiver,}u= "http://120xxxxx08/sxxxxrviceAPI"ret = requests.post(u, params=params)print(ret.url)print(ret)print(ret.text)send_msg("test")