嗨喽~大家好呀,这里是魔王呐 ❤ ~!
python更多源码/资料/解答/教程等 点击此处跳转文末名片免费获取
环境使用:
-
Python 3.10
-
Pycharm
模块使用:
-
requests >>> pip install requests
-
csv
数据可视化:
-
pandas >>> pip install pandas
-
pyecharts >>> pip install pyecharts
爬虫实现基本流程
你要获取什么数据 -> 你要获取的数据在哪?
一. 数据来源分析
-
明确需求
明确采集的网站以及数据内容
网址: https://changsha.yiche.taocheche.com/buycar/pges9bxcdzaoqtrnml/
数据: 车辆信息: 车型 价格 公里数 城市…
-
抓包分析
通过浏览器开发者工具进行分析, 我们想要的数据内容可以请求那个网站能够得到
-
打开开发者工具
F12 / 右键点击检查选择network (网络)
-
刷新网页
让网页的数据内容重新加载一遍 (才能在开发者中看到对应数据包)
-
通过关键字搜索找到对应数据包位置
-
数据包地址: https://proconsumer.taocheche.com/c-car-consumer/carsource/getUcarLo
calList
-
二. 代码实现步骤
导入模块
'''
Python学习交流,免费公开课,免费资料,
免费答疑,系统学习加QQ群:926207505
'''
# 导入数据请求模块
import requests
# 导入格式化输出模块
from pprint import pprint
# 导入csv模块
import csv
-
保存数据
保存表格文件: csv / Excel
f = open('data.csv', mode='w', encoding='utf-8', newline='')
csv_writer = csv.DictWriter(f, fieldnames=['标题','品牌','车型','年份','里程','城市','售价','首付',
])
csv_writer.writeheader()
-
发送请求
模拟浏览器对于url地址发送请求
-
模拟浏览器: 一种简单反反爬手段 (字典)
可以直接在开发者工具中复制 -> 点击数据包 -> 标头(headers) -> 请求标头(request
headers) -> User-Agent (UA) -
请求网址
-
发送请求
请求方法: GET / POST
GET: 是向服务器获取数据
POST: 是需要向服务器提交表单参数
-
'''
Python学习交流,免费公开课,免费资料,
免费答疑,系统学习加QQ群:926207505
'''
# 模拟浏览器
headers = {# User-Agent 用户代理, 表示浏览器基本身份信息'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36'
}
# 请求网址
url = 'https://proconsumer.taocheche.com/c-car-consumer/carsource/getUcarLocalList'
for page in range(1, 101):print(f'============正在采集第{page}页的数据内容============')# 请求参数data = {"liveSwitch":1,"terminal":40,"aggreCarSeries":0,"aggreCarbrands":0,"bangMai":"false","bangMaiChe":"false","baseScore":0,"bigArea":0,"brandId":0,"brandPro":0,"canNonLocal":2,"carAgeId":0,"carBasicId":0,"carLevel":0,"carType":0,"cityId":1301,"color":0,"commonFlag":4,"country":0,"curCity":0,"customizeSortFlag":0,"days":0,"directSaleCar":0,"distanceKm":0,"districtId":0,"drivingMileageId":0,"exhaust":0,"financialPriceHigh":0,"financialPriceLower":0,"firstPic":0,"gearBoxType":0,"highAge":0,"highDrivingMileage":0,"highPrice":0,"isAuthenticated":0,"isCarId":0,"isCheckReportJson":0,"isDealerAuthorized":0,"isDealerRecommend":0,"isExcludeYDG":0,"isJDActivity":0,"isLicensePhoto":0,"isLicensed":0,"isNeglect":0,"isNewCar":0,"isShowMr":0,"isShowRecom":0,"isVideo":0,"isWarranty":0,"level":0,"licenseCityId":0,"liveBroadcast":0,"loanFirstPayHigh":0,"loanFirstPayLower":0,"loanMonthPayHigh":0,"loanMonthPayLower":0,"loanUserid":0,"lowAge":0,"lowDrivingMileage":0,"lowPrice":0,"mainBrandId":0,"newCarHighPrice":0,"newCarLowPrice":0,"noAudit":"false","notCity":0,"notUcarID":0,"orderDirection":0,"pageIndex":page,"pageSize":20,"picCount":0,"price":0,"provinceId":0,"publishTimeStatus":0,"purchaseCityId":0,"regions":"false","requestReferer":0,"requestSource":0,"returnCaryears":"false","score":0,"scorePerformance":0,"seatNumHigh":0,"seatNumLower":0,"seriesId":0,"showPosition":0,"siteIds":"5","sortBoostFlag":0,"sourceType":0,"splitFlowAlgorithm":"","startNum":0,"supperiorId":0,"uCarID":0,"uCarStatus":"1","useBlackUserList":"false","userID":0,"userType":1001,"warrantyType":0}# 发送请求response = requests.post(url=url, json=data, headers=headers)
-
获取数据
获取服务器返回响应数据
-
response.text
获取响应文本数据 (字符串)
-
response.json()
获取响应json数据 (字典) 必须是完整json数据格式
-
response.content
获取响应二进制数据 常用于保存数据 (图片 / 音频 / 视频 / 特定格式文件…)
-
json_data = response.json()
-
解析数据
根据第二步: 获取响应数据 -> 返回字典数据类型
提取数据: 根据键值对取值即可
# 提取车辆信息所对应列表 dataListdataList = json_data['data']['uCarBasicInfoList']['dataList']# for循环遍历, 提取列表里面每个元素for index in dataList:# 提取具体车辆信息内容, 保存字典里面# dit 自定义变量名 (取昵称)dit = {'标题': index['showShortTitle'],'品牌': index['mainBrandName'],'车型': index['serialName'].replace(index['mainBrandName'], ''),'年份': index['buyCarYear'],'里程': index['drivingMileageText'].replace('万公里', ''),'城市': index['purchaseCityName'],'售价': index['activityPrice'],'首付': index['loanFirstPayText'].replace('万', ''),}csv_writer.writerow(dit)print(dit)
数据可视化
一般用 jupyter 进行写代码
-
jupyter notebook
如果你安装的anaconda自带
如果你安装的python, 则需要在cmd中安装jupyter
pip install jupyter notebook
import pandas as pd
from pyecharts.globals import CurrentConfig, NotebookType
CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_LAB
df = pd.read_csv('data.csv')
df.head()
- 可视化官方文档: https://gallery.pyecharts.org/#/README
'''
Python学习交流,免费公开课,免费资料,
免费答疑,系统学习加QQ群:926207505
'''
# 导入配置项目
from pyecharts import options as opts
# 导入饼图
from pyecharts.charts import Pie
# 导入随机生成数据
from pyecharts.faker import Faker
info = df['品牌'].value_counts().index.to_list()
num = df['品牌'].value_counts().to_list()
# 图形配置
c = (Pie().add("",[list(z)for z in zip(info, # 数据num,)],center=["40%", "50%"],).set_global_opts(# 标题title_opts=opts.TitleOpts(title="二手车品牌占比分布情况"),legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),).set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))# 保存可视化效果 为 html文件# .render("pie_scroll_legend.html")
)
# 展示在jupyter 上面
c.load_javascript()
c.render_notebook()
# 导入配置项目
from pyecharts import options as opts
# 导入饼图
from pyecharts.charts import Pie
# 导入随机生成数据
from pyecharts.faker import Faker
info = df['城市'].value_counts().index.to_list()[:10]
num = df['城市'].value_counts().to_list()[:10]
# 图形配置
c = (Pie().add("",[list(z)for z in zip(info, # 数据num,)],center=["40%", "50%"],).set_global_opts(# 标题title_opts=opts.TitleOpts(title="二手车Top10城市占比分布情况"),legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),).set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))# 保存可视化效果 为 html文件# .render("pie_scroll_legend.html")
)
c.render_notebook()
# 导入配置项目
from pyecharts import options as opts
# 导入饼图
from pyecharts.charts import Pie
# 导入随机生成数据
from pyecharts.faker import Faker
info = df['车型'].value_counts().index.to_list()[:10]
num = df['车型'].value_counts().to_list()[:10]
# 图形配置
c = (Pie().add("",[list(z)for z in zip(info, # 数据num,)],center=["40%", "50%"],).set_global_opts(# 标题title_opts=opts.TitleOpts(title="二手车Top10车型占比分布情况"),legend_opts=opts.LegendOpts(type_="scroll", pos_left="80%", orient="vertical"),).set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))# 保存可视化效果 为 html文件# .render("pie_scroll_legend.html")
)
c.render_notebook()
'''
Python学习交流,免费公开课,免费资料,
免费答疑,系统学习加QQ群:926207505
'''
import pyecharts.options as opts
from pyecharts.charts import Line
from pyecharts.faker import Faker
info = df['车型'].value_counts().index.to_list()[:10]
num = df['车型'].value_counts().to_list()[:10]c = (Line().add_xaxis(info).add_yaxis("车型", num, is_connect_nones=True).set_global_opts(title_opts=opts.TitleOpts(title="二手车Top10车型折线图"))# .render("line_connect_null.html")
)
c.render_notebook()
如果文章看不懂,我还准备了视频教程,同样文末名片获取噢~
尾语
最后感谢你观看我的文章呐~本次航班到这里就结束啦 🛬
希望本篇文章有对你带来帮助 🎉,有学习到一点知识~
躲起来的星星🍥也在努力发光,你也要努力加油(让我们一起努力叭)。