Python re 模块应用（一）_re.purge-优快云博客

本文链接：https://blog.youkuaiyun.com/msssssss/article/details/103834375

正则表达式（Regular Expression）是字符串处理的常用工具，通常被用来检索、替换那些符合某个模式（Pattern）的文本。Re模块在python的应用中非常广泛，例如利用python进行网页数据的筛选与提取。自Python 1.5版本起，python的安装环境内已经集成了re 模块，它提供 Perl 风格的正则表达式模式。在python的应用中，只用利用以下语句导入安装库即可：

import re

在python中，re库的应用主要体现在两个方面：

在数据中提取具有指定文本格式的数据，例如获取某网页中全部的电话号码
文本替换，选找到原数据集中的特定文本，将其替换为自己想要的内容或者为空

Python中re库常用的函数

以下内容不做特别说明均有：

Pattern：匹配的正则表达式
String：用来匹配的源文本
flag：标志位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等等。

1.re.match函数
re.match 尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就返回none。其用法为：

re.match(pattern, string, flags=0)

匹配成功re.match方法返回一个匹配的对象，否则返回None。

2.re.search方法

re.search 扫描整个字符串并返回第一个成功的匹配。

re.search(pattern, string, flags=0)

匹配成功re.search方法返回一个匹配的对象，否则返回None。

3.re.findall方法
在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表。

re.findall(pattern, string, flags=0)

4.re.sub模块
用于字符串的替换

re.sub(pattern, repl, string, count=0, flags=0)

返回通过使用 repl 替换在 string 最左边非重叠出现的 pattern 而获得的字符串。如果样式没有找到，则不加改变地返回 string。 repl 可以是字符串或函数

5.re.split模块
用于字符串的拆分

re.split(pattern, string, maxsplit=0, flags=0)

用 pattern 分开 string 。如果在 pattern 中捕获到括号，那么所有的组里的文字也会包含在列表里。如果 maxsplit 非零，最多进行 maxsplit 次分隔，剩下的字符全部返回到列表的最后一个元素。

6.re.compile
用于将将正则表达式的样式编译为一个正则表达式对象（正则对象），可以用于匹配。如果在程序中需要多次使用这个正则表达式的话，使用 re.compile() 和保存这个正则对象以便复用，可以让程序更加高效。
例如：

prog = re.compile(pattern)
result = prog.match(string)

等价于

result = re.match(pattern, string)

其他：
re.purge()清除正则表达式缓存。
re.escape(pattern)转义 pattern 中的特殊字符。如果你想对任意可能包含正则表达式元字符的文本字符串进行匹配，它就是有用的。比如

print(re.escape('http://www.python.org'))
# http://www\.python\.org

注意

re.match与re.search的区别

re.match只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，函数返回None；而re.search匹配整个字符串，直到找到一个匹配。

Re.findall与re.match、re.search的区别

Re.findall用于返回字符串中所有满足的匹配点，直到匹配文本末尾，re.match、re.search则是在找到第一个满足匹配的点就结束匹配过程。

对上述用例的测试代码：
你可以在我的GitHub主页上找到原始代码

import re
pattern_1 = 'abcd'
pattern_2 = 'saih'
string = 'saih-dwohaf-abcd-oergwheihrfgqhdpfcshdshu-abcd-wfe1wr56145102154'

match_1 = re.match(pattern_1,string)
match_2 = re.match(pattern_2,string)

print(match_1)
print(match_2)

'''
输出结果：
None
<_sre.SRE_Match object; span=(0, 4), match='saih'>
'''

search_1 = re.search(pattern_1,string)
search_2 = re.search(pattern_2,string)
print(search_1)
print(search_2)

'''
输出结果：
<_sre.SRE_Match object; span=(11, 15), match='abcd'>
<_sre.SRE_Match object; span=(0, 4), match='saih'>
'''

findall_ = re.findall(pattern_1,string)
print(findall_)

'''
输出结果：
['abcd', 'abcd']
'''
repl = 'dcba'
string_2 = re.sub(pattern_1,repl,string)
print(string_2)

'''
输出结果：
saih-dwohaf-dcba-oergwheihrfgqhdpfcshdshu-dcba-wfe1wr56145102154
'''

split_ = re.split(pattern_1,string)
print(split_)

'''
输出结果：
['saih-dwohaf-', '-oergwheihrfgqhdpfcshdshu-', '-wfe1wr56145102154']
'''

prog = re.compile(pattern_1)
result_1= prog.search(string)
print(result_1)
result_2 = prog.findall(string)
print(result_2)

'''
输出结果：
<_sre.SRE_Match object; span=(12, 16), match='abcd'>
['abcd', 'abcd']
'''