Python re库正则表达式常用函数

Python正则表达式：基本函数和Pattern对象的应用,

最新推荐文章于 2025-09-10 10:01:43 发布

原创最新推荐文章于 2025-09-10 10:01:43 发布 · 436 阅读

7 ·

CC 4.0 BY-SA版权

文章标签：

#python

本文详细介绍了Python中的正则表达式库re的常用函数，如re.split用于分割字符串，re.match和re.search用于查找匹配，re.findall和finditer获取所有匹配，re.sub用于替换匹配项，以及Pattern对象及其方法。

部署运行你感兴趣的模型镜像

1. `re.split(pattern, string, maxsplit=0, flags=0)`

将字符串根据模式的匹配项来分割。如果给出了 maxsplit，则最多分割 maxsplit 次（结果中将有 maxsplit+1 个子字符串）。

import re  
  
text = "one,two three.four:five"  
result = re.split(r'[,\.\s:]+', text)  
print(result)  # ['one', 'two', 'three', 'four', 'five']

2. `re.match(pattern, string, flags=0)`

从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match() 就返回 None。

import re  
  
text = "The rain in SPAIN stays mainly in the plain"  
match = re.match(r"The rain", text)  
if match:  
    print(match.group())  # The rain  
else:  
    print("No match")  
  
# 使用 re.IGNORECASE 忽略大小写  
match = re.match(r"the rain", text, re.IGNORECASE)  
if match:  
    print(match.group())  # The rain  
else:  
    print("No match")

3. `re.search(pattern, string, flags=0)`

扫描整个字符串并返回第一个成功的匹配。

import re  
  
text = "Search a pattern within this text"  
match = re.search(r"pattern", text)  
if match:  
    print(match.group())  # pattern  
else:  
    print("No match")

4. `re.findall(pattern, string, flags=0)`

找到字符串中所有（非重叠）模式的匹配项，并返回一个列表。

import re  
  
text = "Find all pattern matches in this text"  
matches = re.findall(r"pattern", text)  
print(matches)  # ['pattern', 'pattern']

5. `re.finditer(pattern, string, flags=0)`

返回一个迭代器，其中包含字符串中所有非重叠模式匹配项的匹配对象。对于每个匹配项，可以使用 match.group() 方法来获取匹配的文本。

import re  
  
text = "Find all occurrences of the word 'apple' in this text"  
pattern = r"apple"  
matches = re.finditer(pattern, text)  
for match in matches:  
    print(match.group(), "found at position", match.start())

6. `re.sub(pattern, repl, string, count=0, flags=0)`

在字符串中使用 repl 替换所有（或前 count 个）模式的匹配项，并返回修改后的字符串。

import re  
  
text = "Replace foo with bar in this text"  
result = re.sub(r"foo", "bar", text)  
print(result)  # Replace bar with bar in this text  
  
# 只替换第一个匹配项  
result = re.sub(r"bar", "baz", result, 1)  
print(result)  # Replace baz with bar in this text

7. `re.compile(pattern, flags=0)`

编译一个正则表达式，返回一个正则表达式（Pattern）对象，供 match() 和 search() 这类函数使用。

import re  
  
pattern = re.compile(r'\d+')  # 匹配一个或多个数字  
text = "The sequence is 1, 2, 3, and so on."  
matches = pattern.findall(text)  
print(matches)  # ['1', '2', '3']

8. `re.escape(pattern)`

对字符串中所有可能被解释为正则表达式运算符的字符进行转义。

import re  
  
text = "This is a string with special chars: .^$*+?{}\\|[]"  
escaped_text = re.escape(text)  
print(escaped_text)  # This\ is\ a\ string\ with\ special\ chars\:\ \.\^\$\*\+\?\{\}\\\|\[\]

9. `re.purge()`

清除正则表达式缓存。当你使用 re.compile() 编译正则表达式时，Python 会缓存编译后的对象以提高性能。re.purge() 可以清除这个缓存，但在大多数情况下，你不需要手动调用它。

import re  
  
# 假设你已经编译了很多正则表达式  
# ...  
  
# 现在清除缓存  
re.purge()

10. `Pattern` 对象的方法

当你使用 re.compile() 编译一个正则表达式时，返回的是一个 Pattern 对象。这个对象本身也有一些方法，如 match(), search(), findall(), finditer(), sub(), subn() 等，它们的行为与 re 模块中对应的函数类似，但它们是针对已编译的模式进行操作的。

import re  
  
pattern = re.compile(r'\d+')  # 匹配一个或多个数字  
text = "The numbers are 123, 456, and 789"  
  
# 使用 Pattern 对象的方法  
match = pattern.match(text)  # 从字符串开始位置匹配  
if match:  
    print(match.group())  # 123  
  
search = pattern.search(text)  # 在整个字符串中搜索  
if search:  
    print(search.group())  # 123  
  
findall = pattern.findall(text)  # 找到所有匹配项  
print(findall)  # ['123', '456', '789']  
  
# 使用 sub() 方法替换匹配项  
replaced_text = pattern.sub('NUMBER', text)  
print(replaced_text)  # The numbers are NUMBER, NUMBER, and NUMBER

您可能感兴趣的与本文相关的镜像