正则表达式（re模块）超详细讲解

原创

于 2025-08-04 21:06:50 发布 · 815 阅读

28 ·

CC 4.0 BY-SA版权

文章标签：

#正则表达式 #python #数据库

Python 的 re 模块是处理字符串模式匹配的强大工具，广泛应用于爬虫、数据清洗、日志提取等场景。本教程将从基础到高级，逐步解析正则表达式的各个模块及用法。

1：基础匹配与 re.search()

2：使用 re.findall() 获取所有匹配

9：懒惰匹配 *?、+? （###重要###）

10：替换字符串 re.sub()

11：预编译模式 re.compile()

12：零宽断言（前瞻/后顾）

13：DOTALL 和 IGNORECASE 模式

正则表达式功能对照总结表

1：基础匹配与 `re.search()`

import re

text = "Hello, my number is 12345."

# 查找第一个数字序列
match = re.search(r'\d+', text)
print(match.group())  # 输出: 12345

\d+：匹配 1 个或多个数字。
re.search()：匹配到第一个就返回，返回 Match 对象。

2：使用 `re.findall()` 获取所有匹配

text = "Phone: 12345, Fax: 67890"
numbers = re.findall(r'\d+', text)
print(numbers)  # ['12345', '67890']

re.findall() 会返回一个 所有匹配项组成的列表，非常适合提取多个值。

3：字符集与范围 `[abc]`、`[a-z]`

text = "a1 b2 c3 d4"
result = re.findall(r'[a-c]\d', text)
print(result)  # ['a1', 'b2', 'c3']

[a-c]：表示 a~c 范围内的字符。
[0-9] = \d，是数字范围。

4：边界匹配 `^` 与 `$`

text = "hello123"
print(bool(re.match(r'^\w+$', text)))  # True

^：匹配开头，$：匹配结尾。
\w+：表示一个或多个字母、数字、下划线。

5：使用分组 `()` 和 `group(n)`

text = "Name: John, Age: 28"
match = re.search(r'Name: (\w+), Age: (\d+)', text)
print(match.group(1))  # John
print(match.group(2))  # 28

使用括号 () 可以分组匹配内容。
.group(1) 获取第一个括号中的内容。

6：命名分组 `(?P<name>...)`

text = "Product: Apple, Price: $99"
match = re.search(r'Product: (?P<item>\w+), Price: \$(?P<price>\d+)', text)
print(match.group('item'))   # Apple
print(match.group('price'))  # 99

(?P<name>...) 是命名分组，方便读取和代码可读性更强。

最低0.47元/天解锁文章