【python爬虫实战】爬取书店网站的书名价格（注释详解）

思路来源：b站视频【【Python+爬虫】爆肝两个月！拜托三连了！这绝对是全B站最用心（没有之一）的Python+爬虫公开课程，从入门到（不）入狱！-哔哩哔哩】 https://b23.tv/M79rxMd

//具体见上述视频第九集

爬取对象网站：All products | Books to Scrape - Sandbox

爬虫代码见下，语句根据笔者理解，基本已逐句注释。

#! usr/bin/env python
from bs4 import BeautifulSoup
import requestsif __name__ == '__main__':#提取数据和解码content = requests.get("https://books.toscrape.com/").text#发送get请求，结果以text方式返回soup = BeautifulSoup(content, "html.parser")#指定解码方式 "html.parser"#获取价格all_prices = soup.find_all("p", attrs={"class":"price_color"})#观察html可以发现，title并没有什么直接的共性，无法像price那样查询一次便可以唯一确定，#因此，需要采用“逐层缩小范围的方式，利用多个 for循环+findAll/find方法 来一步步确定all_titles = soup.findAll("h3")#findAll会返回一个可迭代对象(thus can be itered by for circle)# 为了不同行之间的元素（特别是价格元素）能对齐输出，故获取最长的书名，从而在所有输出地时候将title的输出长度扩展为统一的值max_title_len = max(len(link.string) for link in all_titles)#遍历输出for title, price in zip(all_titles, all_prices):#注意：for循环每次只能迭代一个可迭代对象，如果有多个，可以用zip()将他们合为一个link = title.find("a")  #由于每个title中只有唯一一个'a'属性， 所以用find即可print(link.string.ljust(max_title_len + 1), '\t', price.string[2:])

在VScode中运行，可实现如下输出效果：