什么是爬虫?——从技术原理到现实应用的全面解析 V
二十一、云原生爬虫架构设计
21.1 无服务器爬虫(AWS Lambda)
# lambda_function.py
import boto3
import requests
from bs4 import BeautifulSoups3 = boto3.client('s3')def lambda_handler(event, context):# 抓取目标页面headers = {'User-Agent': 'AWS-Lambda-Crawler/1.0'}response = requests.get('https://news.example.com/latest', headers=headers)# 解析内容soup = BeautifulSoup(response.text, 'html.parser')articles = []for item in soup.select('.news-item'):articles.append({'title': item.select_one('h2').