一、正则表达式的概念和作用
正则表达式概念:一种字符串匹配的模式
正则表达式作用:
- 可以检查一个字符串中是否包含某种字串
- 替换匹配的字串
- 提取某个字符串中匹配的字串
二、正则表达式中常见的语法
字符 | 描述 | 原样字符 | 匹配字符 |
---|---|---|---|
一般字符 | 匹配自身 | beyond | beyond |
. | 匹配任意除换行符\n 以外的字符 | a.c | aac或abc或acc |
\ | 转义字符,使后一个字符改变原来的意思。如果字符串中有字符*需要匹配,可以使用\* 或者字符集[*] | a\.c 、a\\c | a.c 、a\c |
[...] | 字符集(字符类)。对应的位置可以是字符集中任意字符。字符集中的字符可以逐个列出,也可以给出范围,如[abc]或[a-c]。第一个字符如果是^则表示取反,如[^abc] 表示不是abc的其他字符。所有的特殊字符在字符集中都失去其原有的特殊含义。在字符集中如果要使用] 、- 、^ ,可以在前面加上反斜杠,或把] 、- 放在第一个字符,把^ 放在非第一个字符 | a[bcd]e | abe、ace、ade |
#导入模块
import re#字符匹配
rs1 = re.findall('abc','beyondabcbeyondyanyu')
print(rs1)#结果为:['abc']rs2 = re.findall('a.c','111a*c')
print(rs2)#结果为:['a*c']rs3 = re.findall('a.c','111abc')
print(rs3)#结果为:['abc']rs4 = re.findall('a.c','222a.c')
print(rs4)#结果为:['a.c']rs5 = re.findall('a.c','a\nc')
print(rs5)#结果为:[]rs6 = re.findall('a\.c','111a.c')
print(rs6)#结果为:['a.c']rs7 = re.findall('a\.c','111abc')
print(rs7)#结果为:[]rs8 = re.findall('a[bd]c','abcqq')
print(rs8)#结果为:['abc']rs9 = re.findall('a[bd]c','adcq')
print(rs9)#结果为:['adc']rs0 = re.findall('a[bd]c','afc')
print(rs0)#结果为:[]
预定义字符集(可以写在字符集[…]中)
每个预定义字符集中的字符对应一个字符,中括号里面可以有很多预定义字符集,表示任满足其一即可。
字符 | 描述 | 原样字符 | 匹配字符 |
---|---|---|---|
\d | 数字:[0-9] | a\dc | a5c或a2c或a9c |
\D | 非数字:[^\d] | a\Dc | abc或ayc或a*c |
\s | 空白字符:[<空格>\t\r\n\f\v] | a\sc | a c |
\S | 非空白字符:[^\s] | a\Sc | abc或a8c或a*c |
\w | 单词字符:[A-Za-z0-9_] | a\wc | abc或aDc或a8c |
\W | 非单词字符:[^\w] | a\Wc | a c或a&c或a)c |
#导入模块
import re#字符匹配
rs1 = re.findall('\d','q1s74d')#\d数字:[0-9]
print(rs1)#结果为:['1', '7', '4']rs2 = re.findall('\w','q!#1s7_4d$')#\w单词字符:[A-Za-z0-9_]
print(rs2)#结果为:['q', '1', 's', '7', '_', '4', 'd']rs3 = re.findall('[\s\d]','q 1s7_ 4d $')#\s空白字符、\d数字:[0-9]
print(rs3)#结果为:[' ', '1', '7', ' ', '4', ' ']
数量词(用在字符或(…)之后)
字符 | 描述 | 原样字符 | 匹配字符 |
---|---|---|---|
* | 匹配前一个字符零次或无限次 | abc* | ab、abccccc |
+ | 匹配前一个字符一次或无限次 | abc+ | abc、abcccccc |
? | 匹配前一个字符零次或一次 | abc? | ab、abc |
{m} | 匹配前一个字符m次 | ab{2}c | abbc |
#导入模块
import re#字符匹配
rs1 = re.findall('y\d*','ay0522y12y24yy')#\d数字:[0-9]、*表示\d允许出现零次或无限次;即y之后可以紧跟任何数量的数字
print(rs1)#结果为:['y0522', 'y12', 'y24', 'y', 'y']之所以后面出现两个y是因为*可以出现零次rs2 = re.findall('y\d+','bey12ond0522yan123yu')#\d数字:[0-9]、+表示\d允许出现一次或无限次;即y之后可以紧跟一个以上的数字
print(rs2)#结果为:['y12']rs3 = re.findall('y\d?','ty12ond0522y3an123yu')#\d数字:[0-9]、?表示\d允许出现零次或一次;即y之后可以紧跟一个以上的数字
print(rs3)#结果为:['y1', 'y3', 'y']rs4 = re.findall('y\d{3}','ty1332ond0522y3any123yu')#\d数字:[0-9]、{m?}表示\d允许出现m次;即y之后可以紧跟m个数字
print(rs4)#结果为:['y133', 'y123']
三、re.findall()方法
re.findall(pattern,string,flags=0)
扫描整个string字符串,返回所有与pattern匹配的列表
参数:
- pattern:正则表达式
- string:从哪个字符串中查找
- flags:匹配模式
返回:
- 返回string中与pattern匹配的结果列表
例如:
- re.findall("\d",“b1e1y2o3n4d”)
- 返回结果为[“1”,“1”,“2”,“3”,“4”]
1,findall方法返回匹配的结果列表
#导入模块
import rers1 = re.findall('\d','b12e3y4o56n777d')#\d:一个数字:[0-9]
print(rs1)#结果为:['1', '2', '3', '4', '5', '6', '7', '7', '7']rs2 = re.findall('\d+','b12e3y4o56n777d')#\d+:\d允许出现一次或无限次;即一个或无数个数字
print(rs2)#结果为:['12', '3', '4', '56', '777']
2,findall方法中flag参数的作用
re.DOTALL
、re.S
#导入模块
import rers1 = re.findall('a.bc','a\nbc')
print(rs1)#结果为:[]rs2 = re.findall('a.bc','a\nbc',re.DOTALL)
print(rs2)#结果为:['a\nbc']rs3 = re.findall('a.bc','a\nbc',re.S)
print(rs3)#结果为:['a\nbc']
3,findall方法中分组的使用
分组使用小括号表示
这里的a(.+)bc
,即先通过前a
,后bc
确定位置,之后再返回(.+)
匹配的内容
#导入模块
import rers1 = re.findall('a.+bc','yya\nbc',re.DOTALL)
print(rs1)#结果为:['a\nbc']rs2 = re.findall('a(.+)bc','yya\nbc',re.DOTALL)
print(rs2)#结果为:['\n']
如果正则表达式中没有()
,则返回与整个正则表达式匹配的列表
如果正则表达式中有()
,则返回()中匹配的内容列表,小括号两边的东西都是负责确定提取数据的所在位置
四、正则表达式中r原串的使用
r原串实际上是为了解决转义符
问题
#导入模块
import rers1 = re.findall('a\nbc','a\nbc')
print(rs1)#结果为:['a\nbc']rs2 = re.findall('a\\nbc','a\\nbc')
print(rs2)#结果为:[]rs3 = re.findall('a\\\nbc','a\\nbc')
print(rs3)#结果为:[]rs4 = re.findall('a\\\\nbc','a\\nbc')
print(rs4)#结果为:['a\\nbc']rs5 = re.findall(r'a\nbc','a\nbc')
print(rs5)#结果为:['a\nbc']rs6 = re.findall(r'a\\nbc','a\\nbc')
print(rs6)#结果为:['a\\nbc']
五、提取最新的疫情数据的json字符串
步骤:
- 请求疫情首页内容
- 提取script标签中各国的疫情信息
- 从各国疫情信息中提取各国疫情的json字符串
当然,数据来源仍然是丁香园新型冠状病毒肺炎疫情实时动态首页
url:https://ncov.dxy.cn/ncovh5/view/pneumonia
# 1,导入相关模块
import requests
import re
from bs4 import BeautifulSoup# 2,发送请求,获取疫情首页内容
response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia')
home_page = response.content.decode()
#print(home_page)
# 3,使用Beautiful Soup提取疫情数据
soup = BeautifulSoup(home_page,'lxml')
script = soup.find(id='getAreaStat')
text = script.text
#print(text)
'''
try { window.getAreaStat = [{"provinceName":"香港","provinceShortName":"香港","currentConfirmedCount":5990,"confirmedCount":22468,"suspectedCount":181,"curedCount":16190,"deadCount":288,"comment":"疑似 1 例","locationId":810000,"statisticsData":"https://file1.dxycdn.com/2020/0223/331/3398299755968040033-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":0,"vaccinationOrgCount":0,"cities":[],"dangerAreas":[]},{"provinceName":"台湾","provinceShortName":"台湾","currentConfirmedCount":5413,"confirmedCount":20007,"suspectedCount":485,"curedCount":13742,"deadCount":852,"comment":"","locationId":710000,"statisticsData":"https://file1.dxycdn.com/2020/0223/045/3398299749526003760-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":0,"vaccinationOrgCount":0,"cities":[],"dangerAreas":[]},{"provinceName":"浙江省","provinceShortName":"浙江","currentConfirmedCount":388,"confirmedCount":2255,"suspectedCount":68,"curedCount":1866,"deadCount":1,"comment":"2月10日通报核减的12例在浙江省治愈的外省病例,根据国家最新要求重新纳入累计病例。","locationId":330000,"statisticsData":"https://file1.dxycdn.com/2020/0223/537/3398299755968455045-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":519,"vaccinationOrgCount":217,"cities":[{"cityName":"杭州","currentConfirmedCount":143,"confirmedCount":328,"suspectedCount":0,"curedCount":185,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330100,"currentConfirmedCountStr":"143"},{"cityName":"境外输入","currentConfirmedCount":119,"confirmedCount":387,"suspectedCount":68,"curedCount":268,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"119"},{"cityName":"宁波","currentConfirmedCount":110,"confirmedCount":269,"suspectedCount":0,"curedCount":159,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330200,"currentConfirmedCountStr":"110"},{"cityName":"绍兴","currentConfirmedCount":38,"confirmedCount":430,"suspectedCount":0,"curedCount":392,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330600,"currentConfirmedCountStr":"38"},{"cityName":"金华","currentConfirmedCount":2,"confirmedCount":57,"suspectedCount":0,"curedCount":55,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330700,"currentConfirmedCountStr":"2"},{"cityName":"温州","currentConfirmedCount":0,"confirmedCount":504,"suspectedCount":0,"curedCount":503,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":330300,"currentConfirmedCountStr":"0"},{"cityName":"台州","currentConfirmedCount":0,"confirmedCount":147,"suspectedCount":0,"curedCount":147,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":331000,"currentConfirmedCountStr":"0"},{"cityName":"嘉兴","currentConfirmedCount":0,"confirmedCount":46,"suspectedCount":0,"curedCount":46,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330400,"currentConfirmedCountStr":"0"},{"cityName":"省十里丰监狱","currentConfirmedCount":0,"confirmedCount":36,"suspectedCount":0,"curedCount":36,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"0"},{"cityName":"丽水","currentConfirmedCount":0,"confirmedCount":17,"suspectedCount":0,"curedCount":17,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":331100,"currentConfirmedCountStr":"0"},{"cityName":"衢州","currentConfirmedCount":0,"confirmedCount":14,"suspectedCount":0,"curedCount":14,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330800,"currentConfirmedCountStr":"0"},{"cityName":"湖州","currentConfirmedCount":0,"confirmedCount":10,"suspectedCount":0,"curedCount":10,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330500,"currentConfirmedCountStr":"0"},{"cityName":"舟山","currentConfirmedCount":0,"confirmedCount":10,"suspectedCount":0,"curedCount":10,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330900,"currentConfirmedCountStr":"0"},{"cityName":"待明确地区","currentConfirmedCount":-24,"confirmedCount":0,"suspectedCount":0,"curedCount":24,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"notShowCurrentConfirmedCount":true,"currentConfirmedCountStr":"-"}],"dangerAreas":[]},{"provinceName":"广东省","provinceShortName":"广东","currentConfirmedCount":362,"confirmedCount":4163,"suspectedCount":25,"curedCount":3793,"deadCount":8,"comment":"广东卫健委未明确部分治愈病例的地市归属,因此各地市的现存确诊存在一定偏差。","locationId":440000,"statisticsData":"https://file1.dxycdn.com/2020/0223/281/3398299758115524068-135.json","highDangerCount":0,"midDangerCount":3,"detectOrgCount":120,"vaccinationOrgCount":42,"cities":[{"cityName":"深圳","currentConfirmedCount":145,"confirmedCount":798,"suspectedCount":3,"curedCount":650,"deadCount":3,"highDangerCount":0,"midDangerCount":3,"locationId":440300,"currentConfirmedCountStr":"145"},{"cityName":"广州","currentConfirmedCount":103,"confirmedCount":2205,"suspectedCount":3,"curedCount":2101,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":440100,"currentConfirmedCountStr":"103"},{"cityName":"东莞","currentConfirmedCount":31,"confirmedCount":202,"suspectedCount":1,"curedCount":170,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":441900,"currentConfirmedCountStr":"31"},{"cityName":"佛山","currentConfirmedCount":28,"confirmedCount":318,"suspectedCount":1,"curedCount":290,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":440600,"currentConfirmedCountStr":"28"},{"cityName":"阳江","currentConfirmedCount":23,"confirmedCount":51,"suspectedCount":0,"curedCount":28,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441700,"currentConfirmedCountStr":"23"},{"cityName":"珠海","currentConfirmedCount":15,"confirmedCount":169,"suspectedCount":2,"curedCount":153,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":440400,"currentConfirmedCountStr":"15"},{"cityName":"惠州","currentConfirmedCount":7,"confirmedCount":71,"suspectedCount":0,"curedCount":64,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441300,"currentConfirmedCountStr":"7"},{"cityName":"江门","currentConfirmedCount":7,"confirmedCount":47,"suspectedCount":0,"curedCount":40,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":440700,"currentConfirmedCountStr":"7"},{"cityName":"云浮","currentConfirmedCount":7,"confirmedCount":7,"suspectedCount":0,"curedCount":0,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":445300,"currentConfirmedCountStr":"7"},{"cityName":"中山","currentConfirmedCount":4,"confirmedCount":80,"suspectedCount":0,"curedCount":76,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":442000,"currentConfirmedCountStr":"4"},{"cityName":"湛江","currentConfirmedCount":2,"confirmedCount":43,"suspectedCount":2,"curedCount":41,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":440800,"currentConfirmedCountStr":"2"},{"cityName":"河源","currentConfirmedCount":1,"confirmedCount":6,"suspectedCount":0,"curedCount":5,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441600,"currentConfirmedCountStr":"1"},{"cityName":"肇庆","currentConfirmedCount":0,"confirmedCount":47,"suspectedCount":1,"curedCount":46,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":441200,"currentConfirmedCountStr":"0"},{"cityName":"汕头","currentConfirmedCount":0,"confirmedCount":26,"suspectedCount":0,"curedCount":26,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":440500,"currentConfirmedCountStr":"0"},{"cityName":"清远","currentConfirmedCount":0,"confirmedCount":23,"suspectedCount":0,"curedCount":23,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441800,"currentConfirmedCountStr":"0"},{"cityName":"梅州","currentConfirmedCount":0,"confirmedCount":19,"suspectedCount":0,"curedCount":19,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441400,"currentConfirmedCountStr":"0"},{"cityName":"茂名","currentConfirmedCount":0,"confirmedCount":17,"suspectedCount":0,"curedCount":17,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":440900,"currentConfirmedCountStr":"0"},{"cityName":"揭阳","currentConfirmedCount":0,"confirmedCount":11,"suspectedCount":0,"curedCount":11,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":445200,"currentConfirmedCountStr":"0"},{"cityName":"韶关","currentConfirmedCount":0,"confirmedCount":10,"suspectedCount":0,"curedCount":9,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":440200,"currentConfirmedCountStr":"0"},{"cityName":"潮州","currentConfirmedCount":0,"confirmedCount":7,"suspectedCount":0,"curedCount":7,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":445100,"currentConfirmedCountStr":"0"},{"cityName":"汕尾","currentConfirmedCount":0,"confirmedCount":6,"suspectedCount":0,"curedCount":6,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":441500,"currentConfirmedCountStr":"0"},{"cityName":"待明确地区","currentConfirmedCount":-11,"confirmedCount":0,"suspectedCount":0,"curedCount":11,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"notShowCurrentConfirmedCount":true,"currentConfirmedCountStr":"-"}],"dangerAreas":[{"cityName":"深圳","areaName":"龙岗区坂田街道马安堂社区侨联东10巷1号顺兴楼","dangerLevel":2},{"cityName":"深圳","areaName":"罗湖区东门街道新园路明华广场1至6楼(含6A与M层)商业区","dangerLevel":2},{"cityName":"深圳","areaName":"中兴路高时石材B区A钢构厂房","dangerLevel":2}]},{"provinceName":"广西壮族自治区","provinceShortName":"广西","currentConfirmedCount":319,"confirmedCount":1028,"suspectedCount":0,"curedCount":707,"deadCount":2,"comment":"","locationId":450000,"statisticsData":"https://file1.dxycdn.com/2020/0223/536/3398299758115523880-135.json","highDangerCount":1,"midDangerCount":10,"detectOrgCount":270,"vaccinationOrgCount":15,"cities":[{"cityName":"百色","currentConfirmedCount":227,"confirmedCount":274,"suspectedCount":0,"curedCount":47,"deadCount":0,"highDangerCount":1,"midDangerCount":10,"locationId":451000,"currentConfirmedCountStr":"227"},{"cityName":"境外输入","currentConfirmedCount":91,"confirmedCount":482,"suspectedCount":0,"curedCount":391,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"91"},{"cityName":"南宁","currentConfirmedCount":1,"confirmedCount":57,"suspectedCount":0,"curedCount":56,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450100,"currentConfirmedCountStr":"1"},{"cityName":"北海","currentConfirmedCount":0,"confirmedCount":44,"suspectedCount":0,"curedCount":43,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":450500,"currentConfirmedCountStr":"0"},{"cityName":"防城港","currentConfirmedCount":0,"confirmedCount":39,"suspectedCount":0,"curedCount":39,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450600,"currentConfirmedCountStr":"0"},{"cityName":"桂林","currentConfirmedCount":0,"confirmedCount":32,"suspectedCount":0,"curedCount":32,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450300,"currentConfirmedCountStr":"0"},{"cityName":"河池","currentConfirmedCount":0,"confirmedCount":28,"suspectedCount":0,"curedCount":27,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":451200,"currentConfirmedCountStr":"0"},{"cityName":"柳州","currentConfirmedCount":0,"confirmedCount":24,"suspectedCount":0,"curedCount":24,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450200,"currentConfirmedCountStr":"0"},{"cityName":"玉林","currentConfirmedCount":0,"confirmedCount":11,"suspectedCount":0,"curedCount":11,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450900,"currentConfirmedCountStr":"0"},{"cityName":"来宾","currentConfirmedCount":0,"confirmedCount":11,"suspectedCount":0,"curedCount":11,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":451300,"currentConfirmedCountStr":"0"},{"cityName":"钦州","currentConfirmedCount":0,"confirmedCount":8,"suspectedCount":0,"curedCount":8,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450700,"currentConfirmedCountStr":"0"},{"cityName":"贵港","currentConfirmedCount":0,"confirmedCount":8,"suspectedCount":0,"curedCount":8,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450800,"currentConfirmedCountStr":"0"},{"cityName":"梧州","currentConfirmedCount":0,"confirmedCount":5,"suspectedCount":0,"curedCount":5,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":450400,"currentConfirmedCountStr":"0"},{"cityName":"贺州","currentConfirmedCount":0,"confirmedCount":4,"suspectedCount":0,"curedCount":4,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":451100,"currentConfirmedCountStr":"0"},{"cityName":"崇左","currentConfirmedCount":0,"confirmedCount":1,"suspectedCount":0,"curedCount":1,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":451400,"currentConfirmedCountStr":"0"}],"dangerAreas":[{"cityName":"百色","areaName":"德保县都安乡伏计村陇意屯","dangerLevel":1},{"cityName":"百色","areaName":"城关镇隆盛社区东蒙荣盛二巷25号","dangerLevel":2},{"cityName":"百色","areaName":"城关镇隆盛社区盛象名都小区","dangerLevel":2},{"cityName":"百色","areaName":"都安乡坡那村多麦屯","dangerLevel":2},{"cityName":"百色","areaName":"德保县都安乡福记村山金屯","dangerLevel":2},{"cityName":"百色","areaName":"德保县维也纳酒店(德保腾飞广场店)","dangerLevel":2},{"cityName":"百色","areaName":"东凌镇登限村念洞屯","dangerLevel":2},{"cityName":"百色","areaName":"敬德镇陇正村多果屯","dangerLevel":2},{"cityName":"百色","areaName":"靖西市武平镇大道街大定屯","dangerLevel":2},{"cityName":"百色","areaName":"莲城社区德立山庄","dangerLevel":2},{"cityName":"百色","areaName":"弄贴村新村屯","dangerLevel":2}]},{"provinceName":"上海市","provinceShortName":"上海","currentConfirmedCount":209,"confirmedCount":4003,"suspectedCount":393,"curedCount":3787,"deadCount":7,"comment":"因未公布分区死亡和治愈,仅展示累计确诊和现存确诊","locationId":310000,"statisticsData":"https://file1.dxycdn.com/2020/0223/128/3398299755968454977-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":130,"vaccinationOrgCount":17,"cities":[{"cityName":"境外输入","currentConfirmedCount":208,"confirmedCount":3611,"suspectedCount":8,"curedCount":3403,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"208"},{"cityName":"奉贤区","currentConfirmedCount":1,"confirmedCount":11,"suspectedCount":0,"curedCount":10,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":310120,"currentConfirmedCountStr":"1"},{"cityName":"外地来沪","currentConfirmedCount":0,"confirmedCount":113,"suspectedCount":0,"curedCount":112,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"0"},{"cityName":"浦东新区","currentConfirmedCount":0,"confirmedCount":82,"suspectedCount":0,"curedCount":81,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":310115,"currentConfirmedCountStr":"0"},{"cityName":"宝山区","currentConfirmedCount":0,"confirmedCount":27,"suspectedCount":0,"curedCount":26,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":310113,"currentConfirmedCountStr":"0"},{"cityName":"黄浦区","currentConfirmedCount":0,"confirmedCount":22,"suspectedCount":0,"curedCount":22,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":310101,"currentConfirmedCountStr":"0"},{"cityName":"闵行区","currentConfirmedCount":0,"confirmedCount":19,"suspectedCount":0,"curedCount":19,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":310112,"currentConfirmedCountStr":"0"},{"cityName":"徐汇区","currentConfirmedCount":0,"confirmedCount":18,"suspectedCount":0,"curedCount":17,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":310104,"currentConfirmedCountStr":"0"},{"cityName":"静安区","currentConfirmedCount":0,"confirmedCount":17,"suspectedCount":0,"curedCount":16,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":310106,"currentConfirmedCountStr":"0"},{"cityName":"松江区","currentConfirmedCount":0,"confirmedCount":16,"suspectedCount":0,"curedCount":16,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":310117,"currentConfirmedCountStr":"0"},{"cityName":"长宁区","currentConfirmedCount":0,"confirmedCount":14,"suspectedCount":0,"curedCount":14,"deadCount":...内容太多了已省略
'''
# 4,使用正则表达式提取json字符串
json_str = re.findall(r'\[.+\]',text)[0]#由于中括号是个特殊的字符,需要在前面加个转义符;最后的结果会存在列表中,故使用[0]来获取完整json格式
print(json_str)
'''
[{"provinceName":"香港","provinceShortName":"香港","currentConfirmedCount":5990,"confirmedCount":22468,"suspectedCount":181,"curedCount":16190,"deadCount":288,"comment":"疑似 1 例","locationId":810000,"statisticsData":"https://file1.dxycdn.com/2020/0223/331/3398299755968040033-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":0,"vaccinationOrgCount":0,"cities":[],"dangerAreas":[]},{"provinceName":"台湾","provinceShortName":"台湾","currentConfirmedCount":5413,"confirmedCount":20007,"suspectedCount":485,"curedCount":13742,"deadCount":852,"comment":"","locationId":710000,"statisticsData":"https://file1.dxycdn.com/2020/0223/045/3398299749526003760-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":0,"vaccinationOrgCount":0,"cities":[],"dangerAreas":[]},{"provinceName":"浙江省","provinceShortName":"浙江","currentConfirmedCount":388,"confirmedCount":2255,"suspectedCount":68,"curedCount":1866,"deadCount":1,"comment":"2月10日通报核减的12例在浙江省治愈的外省病例,根据国家最新要求重新纳入累计病例。","locationId":330000,"statisticsData":"https://file1.dxycdn.com/2020/0223/537/3398299755968455045-135.json","highDangerCount":0,"midDangerCount":0,"detectOrgCount":519,"vaccinationOrgCount":217,"cities":[{"cityName":"杭州","currentConfirmedCount":143,"confirmedCount":328,"suspectedCount":0,"curedCount":185,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330100,"currentConfirmedCountStr":"143"},{"cityName":"境外输入","currentConfirmedCount":119,"confirmedCount":387,"suspectedCount":68,"curedCount":268,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"119"},{"cityName":"宁波","currentConfirmedCount":110,"confirmedCount":269,"suspectedCount":0,"curedCount":159,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330200,"currentConfirmedCountStr":"110"},{"cityName":"绍兴","currentConfirmedCount":38,"confirmedCount":430,"suspectedCount":0,"curedCount":392,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330600,"currentConfirmedCountStr":"38"},{"cityName":"金华","currentConfirmedCount":2,"confirmedCount":57,"suspectedCount":0,"curedCount":55,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330700,"currentConfirmedCountStr":"2"},{"cityName":"温州","currentConfirmedCount":0,"confirmedCount":504,"suspectedCount":0,"curedCount":503,"deadCount":1,"highDangerCount":0,"midDangerCount":0,"locationId":330300,"currentConfirmedCountStr":"0"},{"cityName":"台州","currentConfirmedCount":0,"confirmedCount":147,"suspectedCount":0,"curedCount":147,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":331000,"currentConfirmedCountStr":"0"},{"cityName":"嘉兴","currentConfirmedCount":0,"confirmedCount":46,"suspectedCount":0,"curedCount":46,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":330400,"currentConfirmedCountStr":"0"},{"cityName":"省十里丰监狱","currentConfirmedCount":0,"confirmedCount":36,"suspectedCount":0,"curedCount":36,"deadCount":0,"highDangerCount":0,"midDangerCount":0,"locationId":0,"currentConfirmedCountStr":"0"},{"cityName":"丽水",。。。。。。等等等等
'''