本文介绍我用Python语言开发的热搜榜,聚合有百度、头条、微博、知乎和CSDN等网站热搜信息。该工具运行于终端中,比如cmder、powershell或者git bash等,实在是上班、摸鱼之必备工具。
—、工具执行效果
1.1 项目代码
项目代码地址存在gitee中,仓库地址:https://gitee.com/shawn_chen_rtz/hot_billboard.git,欢迎Star。
代码结构:
app.py文件是项目启动文件,执行python app.py,根据提示进行后续操作即可。
1.2 执行效果
执行效果如下,
输入对应数字访问不同网站热搜列表,输入字母q或者Q,工具退出运行。
比如,输入数字3,对应微博热搜列表,
热搜列表打印出后,输入对应数字获取访问链接,
CSDN热搜榜,
1.3 app.py启动文件程序
app.py程序,
# -*- coding:utf-8 -*-
from baidu_hot import get_baidu_hot
from toutiao_hot import get_toutiao_hot
from weibo_hot import get_weibo_hot
from zhihu_hot import get_zhihu_hot
from csdn_hot import get_csdn_hot
import time
print("欢迎回来!请输入对应数字浏览热搜")
on = True
while on:user_input = input("1-baidu;2-toutiao;3-weibo;4-zhihu;5-CSDN;q/Q-退出;请输入:")if user_input == '1':get_baidu_hot()elif user_input == '2':get_toutiao_hot()elif user_input == '3':get_weibo_hot()elif user_input == '4':get_zhihu_hot()elif user_input == '5':get_csdn_hot()elif user_input == 'q' or user_input == 'Q':on = Falseelse:print("用户非法输入,3s后刷新,重新选择操作")time.sleep(3)
print("退出应用成功,期待再次光临")
一个while循环,循环体中根据用户输入内容进行条件判断,执行对应方法。
二、百度热搜实现
2.1 涉及模块
获取百度热搜方法实现需要导入模块requests、BeautifulSoup、re、time
2.2 对应接口
百度热搜接口:
https://top.baidu.com/board?tab=realtime
2.3 代码实现
代码实现,
import requests
from bs4 import BeautifulSoup
import re
import time
def get_baidu_hot():while True:baidu_top = "https://top.baidu.com/board?tab=realtime"resp = requests.get(baidu_top)resp.encoding = 'utf-8'html = resp.textsoup = BeautifulSoup(html,'html.parser')news = soup.findAll(class_="content_1YWBm")news.reverse()i = 0news_ls = []for new in news:i = i + 1url = new.find('a').attrs['href']text = new.find(class_="c-single-text-ellipsis").textnews_ls.append({"text":text.strip(),"url":url})print(('\033[1;37m'+str(i)+'\033[0m').center(50,"*"))print("\033[1;36m"+text.strip()+"\033[0m")# news_ls.reverse()user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜:")if user_input == 'q' or user_input == 'Q':breakelif user_input == 'r' or user_input == 'R':continueelif user_input in [str(i) for i in range(1,len(news_ls)+1)]:news_index = eval(user_input) - 1print(news_ls[news_index].get('url'))print("\033[1;33m" + "按住Ctrl键,点击超链接进行访问" + "\033[0m")print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')time.sleep(10)continueelse:print("Invalid User Input.")print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')time.sleep(3)continueprint("Over,退出百度热搜!")
其中需要注意,根据接口返回页面数据具体情况使用BeautifulSoup模块。
三、头条热搜实现
3.1 涉及模块
获取头条热搜方法实现需要导入模块requests、time
3.2 对应接口
头条热搜的访问接口:
https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc
3.3 代码实现
代码实现,
import requests
import time
def get_toutiao_hot():while True:url = "https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc"resp = requests.get(url)resp.encoding = 'utf-8'resp = resp.json()news_ls = []i = 0news = resp.get('data')news.reverse()for new in news:i += 1print(('\033[1;37m'+str(i)+'\033[0m').center(50,'*'))news_ls.append({'title':new.get('Title'),'url':new.get('Url')})print('\033[1;36m'+new.get('Title')+'\033[0m')fixed_top_data = resp.get('fixed_top_data')fixed_top_data = fixed_top_data[0]news_ls.append({'title':fixed_top_data.get('Title'),'url':fixed_top_data.get('Url')})print(('\033[1;37m'+str(i+1)+'\033[0m').center(50,'*'))print('\033[1;36m'+news_ls[-1].get('title')+'\033[0m')user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜:")if user_input == 'q' or user_input == 'Q':breakelif user_input == 'r' or user_input == 'R':continueelif user_input in [str(i) for i in range(1,len(news_ls)+1)]:news_index = eval(user_input) - 1print(news_ls[news_index].get('url'))print("\033[1;33m" + "按住Ctrl键,点击超链接进行访问" + "\033[0m")print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')time.sleep(10)continueelse:print("Invalid User Input.")print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')time.sleep(3)continueprint("Over,退出头条热搜!")
与百度热搜的区别是,该接口返回json数据,不是html源代码。所以不需要使用模块BeautifulSoup、re分析匹配页面元素。返回数据处理相对简单~
四、微博热搜实现
4.1 涉及模块
获取微博热搜方法实现需要导入模块requests、time、BeautifulSoup
4.2 对应接口
微博热搜的访问接口:
https://s.weibo.com/top/summary?cate=realtimehot
需要注意的是该接口的访问需要设置请求头,设置对应cookie信息,否则访问异常。
cookie信息,本章节的代码实现中是随机设置的,可以通过以下方法自行查找获取设置。浏览器页面访问https://s.weibo.com/top/summary?cate=realtimehot,F12找到该请求,如下图。
4.3 代码实现
代码实现,
import requests
import time
from bs4 import BeautifulSoup
def get_weibo_hot():while True:url = "https://s.weibo.com/top/summary?cate=realtimehot"headers = {"Cookie":"SUB=_2AxxxxxxxxxNxqwJxxx3dtWXlM5SjftExkMQK6NASTHqZWXWFEB;"}resp = requests.get(url=url,headers=headers)resp.encoding = 'utf-8'html = resp.textsoup = BeautifulSoup(html,'html.parser')news = soup.findAll(class_='td-02')news.reverse()base_url = "https://s.weibo.com"news_ls = []i = 0for new in news:i = i + 1url = base_url + new.find('a').attrs['href']# print(url)title = new.find('a').textprint(('\033[1;37m' + str(i) + '\033[0m').center(50,'*'))print('\033[1;36m' + title + '\033[0m')news_ls.append({"title":title,"url":url})news_length = len(news_ls)# news_ls.reverse()user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜:")if user_input == 'q' or user_input == 'Q':breakelif user_input == 'r' or user_input == 'R':continueelif user_input in [str(i) for i in range(1,news_length+1)]:news_index = eval(user_input) - 1print(news_ls[news_index].get('url'))print("\033[1;33m" + "按住Ctrl键,点击超链接进行访问" + "\033[0m")print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')time.sleep(10)continueelse:print("Invalid User Input.")print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')time.sleep(3)continueprint("Over,退出微博热搜!")
同百度热搜返回结果处理类似,需要使用BS模块对返回数据进行处理,查找到对应热搜数据。BeautifulSoup模块在网页爬虫数据处理中起到很大的作用,可以重点关注下该模块。
五、知乎热搜实现
5.1 涉及模块
获取知乎热搜方法实现需要导入模块requests、time、BeautifulSoup、json
5.2 对应接口
知乎热搜的访问接口:
https://www.zhihu.com/billboard
5.3 代码实现
代码实现,
import requests
import time
from bs4 import BeautifulSoup
import json
def get_zhihu_hot():while True:url = "https://www.zhihu.com/billboard"resp = requests.get(url)resp.encoding = 'utf-8'html = resp.textsoup = BeautifulSoup(html,'html.parser')news = soup.findAll(class_='HotList-itemTitle')# print(len(news))news_ls = []title_ls = []for new in news:title = new.text# print(title)title_ls.append(title)js_text_dict = json.loads(soup.find('script',{'id':'js-initialData'}).get_text())#print(js_text_dict['initialState']['topstory']['hotList'])js_text_dict = js_text_dict['initialState']['topstory']['hotList']url_ls = []for new in js_text_dict:url = new['target']['link']['url']url_ls.append(url)news_ls = [{'title':title_ls[i],'url':url_ls[i]} for i in range(len(title_ls))]news_ls.reverse()# print(news_ls)i = 0for new in news_ls:i += 1print(('\033[1;37m'+str(i)+'\033[0m').center(50,"*"))print('\033[1;36m'+new.get('title')+'\033[0m')news_length = len(news_ls)# news_ls.reverse()user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜:")if user_input == 'q' or user_input == 'Q':breakelif user_input == 'r' or user_input == 'R':continueelif user_input in [str(i) for i in range(1,news_length+1)]:news_index = eval(user_input) - 1print(news_ls[news_index].get('url'))print("\033[1;33m" + "按住Ctrl键,点击超链接进行访问" + "\033[0m")print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')time.sleep(10)continueelse:print("Invalid User Input.")print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')time.sleep(3)continueprint("Over,退出知乎热搜!")
六、CSDN热搜实现
6.1 涉及模块
获取CSDN热搜方法实现需要导入模块requests、time
6.2 对应接口
CSDN热搜的访问接口:
https://blog.csdn.net/phoenix/web/blog/hot-rank?page=0&pageSize=50
https://blog.csdn.net/phoenix/web/blog/hot-rank?page=1&pageSize=50
注意!该接口返回数据较多,使用了分页参数page和pageSize,注意page参数替换成对应数字即可。比如0和1;该接口访问也需要设置请求头,否则返回不了正确数据。
6.3 代码实现
代码实现,
import requests
import time
def get_csdn_hot():while True:news_ls = []for i in range(2):url = "https://blog.csdn.net/phoenix/web/blog/hot-rank?page=" + str(i) + "&pageSize=50"#print(url)# csdn做了校验,必须设置请求头中的User-Agent才能成功返回内容headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"}resp = requests.get(url,headers=headers)resp = resp.json()news = resp['data']for new in news:news_ls.append({"title":new.get('articleTitle'),"url":new.get('articleDetailUrl')})i = 0news_ls.reverse()for new in news_ls:i += 1print(("\033[1;37m" + str(i) + "\033[0m").center(50,"*"))print("\033[1;36m" + new.get('title') + "\033[0m")news_length = len(news_ls)# news_ls.reverse()user_input = input("输入新闻编号获取进一步访问的超链接,输入q/Q退出,输入r/R刷新热榜:")if user_input == 'q' or user_input == 'Q':breakelif user_input == 'r' or user_input == 'R':continueelif user_input in [str(i) for i in range(1,news_length+1)]:news_index = eval(user_input) - 1print(news_ls[news_index].get('url'))print("\033[1;33m" + "按住Ctrl键,点击超链接进行访问" + "\033[0m")print('\033[5;31m'+'10s后自动刷新热榜'+'\033[0m')time.sleep(10)continueelse:print("Invalid User Input.")print('\033[5;31m'+"3s后自动刷新热榜"+'\033[0m')time.sleep(3)continueprint("Over,退出CSDN热搜!")
可以关注作者微信公众号,追踪更多有价值的内容!