今天我就来分享一下我的方法:Python爬虫
在CS dn社区中我浏览了许多关于爬虫代码,可都有各自的缺陷,有的需要ID比较麻烦,这里我编写了一个程序,他只需要输入歌曲名字即可进行搜索爬取并下载
话不多说,下面的程序复制下来吧,如果你觉得好用,创作不易,就关注一下我,点个赞,加个收藏吧!
记住,一定要等到程序,全部运行完毕,不要急着停止去听,否则文件可能破损,听不了!
展示Python代码
from lxml import etree
import requests
import json
from concurrent.futures import ThreadPoolExecutor# 创建线程池
pool = ThreadPoolExecutor(max_workers=10)
# 请求头信息
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400"
}
def download(id, name):# 构造下载链接url = f'http://music.163.com/song/media/outer/url?id={id}'# 发送下载请求response = requests.get(url=url, headers=headers).content# 将响应内容写入文件with open(name+'.mp3', 'wb') as f:f.write(response)# 打印下载完成消息print(name, '下载完成')
def get_id(url):# 发送请求获取页面内容response = requests.get(url=url, headers=headers).text# 使用XPath解析页面page_html = etree.HTML(response)# 提取歌曲列表信息id_list = page_html.xpath('//textarea[@id="song-list-pre-data"]/text()')[0]# 解析歌曲列表信息,并逐个提交下载任务到线程池for i in json.loads(id_list):name = i['name']id = i['id']author = i['artists'][0]['name']pool.submit(download, id, name+'-'+author)# 关闭线程池pool.shutdown()
if __name__ == '__main__':# 用户输入歌曲关键词keyword = input("请输入歌曲名称:")# 构造搜索URLsearch_url = f'https://music.163.com/api/search/get/web?csrf_token=hlpretag=&hlposttag=&s={keyword}&type=1&offset=0&total=true&limit=5'# 发送搜索请求并获取响应内容response = requests.get(url=search_url, headers=headers).json()# 提取歌曲列表song_list = response['result']['songs']# 遍历歌曲列表,逐个提交下载任务到线程池for song in song_list:name = song['name']id = song['id']author = song['artists'][0]['name']pool.submit(download, id, name+'-'+author)# 关闭线程池pool.shutdown()
快去试试吧!只要输入你想爬取的歌名即可!