正则表达式篇

文章目录

1. 导入re模块
2. 正则表达式的基本模式
3. re模块的主要函数和方法
4. 示例

正则表达式（Regular Expression，常简写为regex或regexp）是一种强大的文本处理工具，它使用一种特殊的字符序列来帮助用户检查一个字符串是否与某种模式匹配。Python内置的re模块提供了完整的正则表达式功能。

以下是一个关于Python中正则表达式的详细教程：

1. 导入re模块

首先，你需要导入Python的re模块来使用正则表达式。


import re

2. 正则表达式的基本模式

字符匹配：
- .：匹配任意字符（除了换行符）
- [abc]：匹配方括号中的任意一个字符
- [^abc]：匹配不在方括号中的任意一个字符
- [a-z]：匹配任意小写字母
- [A-Z]：匹配任意大写字母
- [a-zA-Z]：匹配任意字母
- [0-9]：匹配任意数字
- \d：匹配任意数字，等同于[0-9]
- \D：匹配非数字，等同于[^0-9]
- \w：匹配任意字母、数字或下划线，等同于[a-zA-Z0-9_]
- \W：匹配非字母、数字或下划线，等同于[^a-zA-Z0-9_]
- \s：匹配任意空白字符，包括空格、制表符、换页符等
- \S：匹配非空白字符
数量词：
- *：匹配前面的子表达式零次或多次
- +：匹配前面的子表达式一次或多次
- ?：匹配前面的子表达式零次或一次
- {n}：匹配前面的子表达式恰好n次
- {n,}：匹配前面的子表达式至少n次
- {n,m}：匹配前面的子表达式至少n次，但不超过m次
边界匹配：
- ^：匹配字符串的开始
- $：匹配字符串的结束
- \b：匹配一个单词边界
- \B：匹配非单词边界
选择、分组和引用：
- |：或操作，匹配|左右任意一个表达式
- ()：捕获括号，对正则表达式进行分组，并捕获匹配的文本
- (?:…)：非捕获括号，只进行分组，不捕获匹配的文本
- \n：引用前面第n个捕获括号中匹配的文本（n为正整数）
  转义字符：
- \：对特殊字符进行转义，使其失去特殊意义

3. re模块的主要函数和方法

re.match(pattern, string, flags=0)：从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就返回None。
re.search(pattern, string, flags=0)：扫描整个字符串并返回第一个成功的匹配。
re.findall(pattern, string, flags=0)：在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表。
re.finditer(pattern, string, flags=0)：和findall类似，但返回的是一个迭代器。
re.split(pattern, string, maxsplit=0, flags=0)：按照能够匹配的子串将字符串分割后返回列表。
re.sub(pattern, repl, string, count=0, flags=0)：在字符串中查找匹配正则表达式的部分，并将其替换为另一个字符串。
re.compile(pattern, flags=0)：将正则表达式编译成一个Pattern对象，可以供match()和search()这两个函数使用。

4. 示例

以下是一些使用Python正则表达式的示例：

re.match()


import re  # 匹配字符串起始位置的模式  
pattern = r'Hello'  
string = 'Hello, world!'  
match = re.match(pattern, string)  
if match:  print('Found match:', match.group())  # 输出: Found match: Hello  
else:  print('No match found.')

re.search()


import re  # 在整个字符串中搜索模式  
pattern = r'\d+'  # 匹配一个或多个数字  
string = 'The price is 123 dollars.'  
search = re.search(pattern, string)  
if search:  print('Found match:', search.group())  # 输出: Found match: 123  
else:  print('No match found.')

re.findall()


import re  # 找到所有匹配模式的子串  
pattern = r'\b\w+\b'  # 匹配单词边界之间的单词  
string = 'Hello world, this is a Python tutorial.'  
matches = re.findall(pattern, string)  print('Matches:', matches)  # 输出: Matches: ['Hello', 'world', 'this', 'is', 'a', 'Python', 'tutorial']

re.finditer()


import re  # 找到所有匹配模式的子串，并返回迭代器  
pattern = r'\d+'  
string = 'The numbers are 123 and 456.'  
matches = re.finditer(pattern, string)  
for match in matches:  print('Found match:', match.group())  # 输出: Found match: 123 和 Found match: 456

re.split()


import re  # 使用模式分割字符串  
pattern = r'\s+'  # 匹配一个或多个空白字符  
string = 'This is a test string.'  
split_string = re.split(pattern, string)  print('Split string:', split_string)  # 输出: Split string: ['This', 'is', 'a', 'test', 'string.']

re.sub()


import re  # 替换字符串中匹配模式的子串  
pattern = r'\d+'  
repl = 'NUMBER'  
string = 'The price is 123 dollars and the code is 456.'  
new_string = re.sub(pattern, repl, string)  print('New string:', new_string)  # 输出: New string: The price is NUMBER dollars and the code is NUMBER.

re.compile()


import re  # 编译正则表达式为Pattern对象，之后可以多次使用  
pattern = re.compile(r'\b\w+\b')  
string = 'Hello world, this is a Python tutorial.'  
matches = pattern.findall(string)  print('Matches:', matches)  # 输出: Matches: ['Hello', 'world', 'this', 'is', 'a', 'Python', 'tutorial']