Python正则

match()和search()方法是re模块中用于正则表达式匹配的两个函数，而不是research()。

match()方法从字符串开头开始匹配，只有当模式从字符串的起始位置开始匹配成功时才会返回匹配对象。如果模式无法从字符串起始位置匹配成功，则返回None。

search()方法在整个字符串中搜索匹配模式的第一个位置，并返回匹配对象。它不要求模式从字符串的起始位置匹配成功，只要字符串中存在匹配的内容即可。

下面是一个示例代码来说明两者之间的不同：

import repattern = r'abc'
string = 'xabcdefg'# 使用 match() 方法进行匹配
match_result = re.match(pattern, string)
print(match_result)  # None，因为模式无法从字符串起始位置匹配成功# 使用 search() 方法进行匹配
search_result = re.search(pattern, string)
print(search_result)  # <re.Match object; span=(1, 4), match='abc'>，匹配成功并返回匹配对象

总结：

match()方法从字符串起始位置开始匹配，如果匹配成功则返回匹配对象，否则返回None。
search()方法在整个字符串中搜索匹配模式的第一个位置，如果匹配成功则返回匹配对象，否则返回None。

compile()匹配模式
在Python中，re.compile()函数用于将正则表达式模式编译为一个对象，以便可以重复使用该模式进行匹配。下面是一些常用的匹配模式及相应的示例：

re.IGNORECASE：不区分大小写匹配

import repattern = re.compile(r"hello", re.IGNORECASE)
result = pattern.search("Hello World")
print(result.group())  # 输出: Hello

re.MULTILINE：多行匹配，使^和$匹配每行的开始和结束

import repattern = re.compile(r"^hello", re.MULTILINE)
result = pattern.findall("hello world\nhello, python")
print(result)  # 输出: ['hello', 'hello']

re.DOTALL：匹配任意字符的模式，包括换行符\n

import repattern = re.compile(r"hello.*world", re.DOTALL)
result = pattern.search("hello\nworld")
print(result.group())  # 输出: hello\nworld

re.ASCII：只匹配ASCII字符的模式

import repattern = re.compile(r"[a-z]", re.ASCII)
result = pattern.findall("Hello, Python")
print(result)  # 输出: ['e', 'l', 'l', 'o', 'y', 't', 'h', 'o', 'n']

re.DEBUG：打印调试信息，方便调试正则表达式

import repattern = re.compile(r"\d{3}-\d{4}", re.DEBUG)
result = pattern.search("My phone number is 123-4567")
# 输出调试信息:
# MAX_REPEAT 3 4
#   LITERAL 45
#     IN
#       RANGE (48, 57)
#     MAX_REPEAT 0 4
#       IN
#         RANGE (48, 57)
#   LITERAL 45
#     IN
#       RANGE (48, 57)
print(result.group())  # 输出: 123-4567

这些只是一些常用的匹配模式，re.compile()函数还支持其他更多的选项和模式。可以通过查阅Python官方文档来了解更多详细信息。

Python分组应用
正则表达式分组是指在正则表达式中用括号 () 将一部分模式进行分组。分组的概念和用法如下：

分组概念：将不同的模式按照逻辑关系组合起来，形成一个整体。分组可以被当做一个整体来操作，方便进行匹配、替换和提取等操作。分组可以嵌套使用。
用法：
- 匹配分组：使用括号将要匹配的部分模式包围起来，形成一个分组。可以使用“|”将多个分组进行选择匹配。
- 替换分组：使用括号将要替换的部分模式包围起来，并使用“\数字”的方式引用分组，其中数字表示分组的序号。
- 提取分组：使用括号将要提取的部分模式包围起来，将匹配到的内容提取出来。
分组序号：每个分组都有一个唯一的序号，从左向右以左括号的顺序进行标记，从 1 开始计数。
分组引用：可以使用“\数字”的方式引用分组，其中数字表示分组的序号。例如要替换分组1的内容，可以使用 \1 来引用分组1的内容。
命名分组：可以给分组指定一个名称，方便引用。命名分组使用语法 (?Ppattern) 进行定义，其中 name 为分组名称，pattern 为模式。
分组修饰符：可以在分组中使用修饰符进行特定的操作。
- ?: 在分组中加入 ?: 前缀，表示非捕获分组，不会保留此分组的匹配结果。
- ?= 在分组中加入 ?= 前缀，表示正向肯定预查，只匹配文本中满足预查条件的内容。
- ?! 在分组中加入 ?! 前缀，表示正向否定预查，只匹配文本中不满足预查条件的内容。
分组示例：

import re# 匹配分组示例
pattern = r"(dog|cat)"
text = "I have a dog and a cat"
result = re.findall(pattern, text)
print(result)  # ['dog', 'cat']# 替换分组示例
pattern = r"(\d+)-(\d+)-(\d+)"
text = "Today is 2021-01-01"
result = re.sub(pattern, r"\3/\2/\1", text)
print(result)  # Today is 01/01/2021# 提取分组示例
pattern = r"(\d+)-(\d+)-(\d+)"
text = "Today is 2021-01-01"
result = re.search(pattern, text)
if result:year = result.group(1)month = result.group(2)day = result.group(3)print(year, month, day)  # 2021 01 01

以上是关于Python正则表达式分组概念与用法的详解。分组可以使正则表达式更加灵活和方便，能够处理更复杂的匹配、替换和提取需求。

findall() 与finditer() 方法
在Python中，findall()和finditer()都是正则表达式模块re的方法，用于匹配字符串并返回匹配结果。

区别：

findall()：该方法会返回所有匹配的字符串列表，每个匹配的字符串作为列表的一个元素。如果正则表达式中包含分组，则返回的是所有分组的元组。
finditer()：该方法返回的是一个迭代器对象，使用迭代器可以逐个访问每个匹配的匹配对象。每个匹配对象包含匹配的字符串以及其他相关信息，如起始位置和结束位置等。

具体用法举例：

import re# 定义待匹配的字符串
text = 'hello world, hello python, hello regex'# 使用findall()方法匹配所有的hello开头的单词
result_findall = re.findall(r'hello\w+', text)
print(result_findall)  # ['hello', 'hello', 'hello']# 使用finditer()方法匹配所有的hello开头的单词
result_finditer = re.finditer(r'hello\w+', text)
for match in result_finditer:print(match.group())  # hello, hello, hello

注意，findall()方法返回的是一个列表，而finditer()方法返回的是一个迭代器对象，需要通过循环逐个获取匹配结果。如果只对匹配结果进行遍历，使用finditer()方法可以降低内存消耗。

re.sub() 与re.subn() 区别与举例
sub()方法用于替换字符串中的匹配项，可以指定替换的次数。示例如下：

import re# 替换匹配的字符为指定字符串
text = "Hello, my name is Alice."
new_text = re.sub(r"Alice", "Bob", text)
print(new_text)
# Output: Hello, my name is Bob.# 替换所有匹配的字符为指定字符串
text = "Hello, my name is Alice. Alice is my friend."
new_text = re.sub(r"Alice", "Bob", text)
print(new_text)
# Output: Hello, my name is Bob. Bob is my friend.# 限制替换次数
text = "Hello, my name is Alice. Alice is my friend."
new_text = re.sub(r"Alice", "Bob", text, count=1)
print(new_text)
# Output: Hello, my name is Bob. Alice is my friend.

subn()方法与sub()方法功能相同，但返回一个元组，包含新字符串和替换次数。示例如下：

import retext = "Hello, my name is Alice. Alice is my friend."
new_text, count = re.subn(r"Alice", "Bob", text)
print(new_text)
# Output: Hello, my name is Bob. Bob is my friend.
print(count)
# Output: 2