使用 Python 从 REST URL 下载文件,可以使用 requests
库来简化文件的下载和保存过程。以下是一个示例代码,展示了如何从给定的 REST API 或 URL 下载文件并保存到本地。
1、问题背景
我们需要编写一个脚本,从一个支持 REST URL 的网站下载一堆文件。该网站的 GET 请求如下:
GET /test/download/id/5774/format/testTitle HTTP/1.1
Host: testServer.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=11863783.1459862770.1379789243.1379789243.1379789243.1; __utmb=11863783.28.9.1379790533699; __utmc=11863783; __utmz=11863783.1379789243.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); PHPSESSID=fa844952890e9091d968c541caa6965f; loginremember=Qraoz3j%2BoWXxwqcJkgW9%2BfGFR0SDFLi1FLS7YVAfvbcd9GhX8zjw4u6plYFTACsRruZM4n%2FpX50%2BsjXW5v8vykKw2XNL0Vqo5syZKSDFSSX9mTFNd5KLpJV%2FFlYkCY4oi7Qyw%3D%3D; ma-refresh-storage=1; ma-pref=KLSFKJSJSD897897; skipPostLogin=0; pp-sid=hlh6hs1pnvuh571arl59t5pao0; __utmv=11863783.|1=MemberType=Yearly=1; nats_cookie=http%253A%252F%252Fwww.testServer.com%252F; nats=NDc1NzAzOjQ5MzoyNA%2C74%2C0%2C0%2C0; nats_sess=fe3f77e6e326eb8d18ef0111ab6f322e; __utma=163815075.1459708390.1379790355.1379790355.1379790355.1; __utmb=163815075.1.9.1379790485255; __utmc=163815075; __utmz=163815075.1379790355.1.1.utmcsr=ppp.contentdef.com|utmccn=(referral)|utmcmd=referral|utmcct=/postlogin; unlockedNetworks=%5B%22rk%22%2C%22bz%22%2C%22wkd%22%5D
Connection: close
如果请求成功,它将返回一个 302 响应,如下所示:
HTTP/1.1 302 Found
Date: Sat, 21 Sep 2013 19:32:37 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
location: http://downloads.test.stuff.com/5774/stuff/picture.jpg?wed=20130921152237&wer=20130922153237&hash=0f20f4a6d0c9f1720b0b6
Vary: User-Agent,Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8
我们需要做的是检查是否收到 302 响应。如果不是,则继续执行,如果是,则需要解析出这里显示的 location
参数:
location: http://downloads.test.stuff.com/5774/stuff/picture.jpg?wed=20130921152237&wer=20130922153237&hash=0f20f4a6d0c9f1720b0b6
拿到 location
参数后,我们需要向该 URL 发出另一个 GET 请求以下载文件。我们还需要在会话中维护 Cookie 以便下载文件。
2、解决方案
我们可以使用 requests
库来完成这个任务。requests
库是一个用于发送 HTTP 请求的库,它提供了多种方法来处理 HTTP 响应。
首先,我们需要安装 requests
库:
pip install requests
然后,我们可以使用 requests
库来发送 GET 请求:
import requests# 发送 GET 请求
response = requests.get("http://testServer.com/test/download/id/5774/format/testTitle",headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0"},cookies={"__utma": "11863783.1459862770.1379789243.1379789243.1379789243.1","__utmb": "11863783.28.9.1379790533699","__utmc": "11863783","__utmz": "11863783.1379789243.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)","PHPSESSID": "fa844952890e9091d968c541caa6965f","loginremember": "Qraoz3j%2BoWXxwqcJkgW9%2BfGFR0SDFLi1FLS7YVAfvbcd9GhX8zjw4u6plYFTACsRruZM4n%2FpX50%2BsjXW5v8vykKw2XNL0Vqo5syZKSDFSSX9mTFNd5KLpJV%2FFlYkCY4oi7Qyw%3D%3D","ma-refresh-storage": "1","ma-pref": "KLSFKJSJSD897897","skipPostLogin": "0","pp-sid": "hlh6hs1pnvuh571arl59t5pao0","__utmv": "11863783.|1=MemberType=Yearly=1","nats_cookie": "http%253A%252F%252Fwww.testServer.com%252F","nats": "NDc1NzAzOjQ5MzoyNA%2C74%2C0%2C0%2C0","nats_sess": "fe3f77e6e326eb8d18ef0111ab6f322e","__utma": "163815075.1459708390.1379790355.1379790355.1379790355.1","__utmb": "163815075.1.9.1379790485255","__utmc": "163815075","__utmz": "1
这个简单的 Python 脚本可以帮助我们从 REST URL 下载文件并保存到本地,同时确保了基本的错误处理和内存使用效率。