python正则表达式使用干货

最新推荐文章于 2024-08-03 05:56:08 发布

Taonny

最新推荐文章于 2024-08-03 05:56:08 发布

阅读量860

点赞数 2

CC 4.0 BY-SA版权

文章标签：正则表达式 python 开发语言

本文链接：https://blog.youkuaiyun.com/weixin_44940593/article/details/126950546

本文详细介绍了Python中正则表达式的使用方法，包括各种匹配字符、模式和常用方法，如findall、finditer、search、match等。通过实例演示如何进行字符串匹配、搜索和替换。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

re对应的匹配字符
.
匹配换行符之外的任何一个字符
\d
匹配数字，即0-9
\D
匹配非数字，即不是数字
\s
匹配空格，即空格，tab键
\S
匹配非空格字符
\w
匹配单词字符
\W
匹配非单词字符

？
匹配前面的字符0次或者1次
*
匹配前面的字符0次或者多次
+
匹配前面的字符1此或者多次
{m}
匹配前面的表达式m次
{m,}
匹配前面的表达式至少m次
{,n}
匹配前面的表达式最多n次
{m,n}
匹配前面的表达式至少m次，至多n次
()
捕获()内部的内容
re模块的匹配模式
re.I
忽略大小写
re.L
表示特殊字符集，\w,\W,\b,\B,\s,\S
re.M
多行模式
re.S
即为.并且包括换行符在内的任意字符（.不包括换行符）
re.U
表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库
re.X
为了增加可读性，忽略空格和#后面的注释
re模块儿的部分方法
eg: 匹配出字符串s中所有包含a的单词
s = “life can be dreams,life can be great thoughts,life can mean a person, sitting in a court,address”

1）re.compile()
compile 函数用于编译正则表达式，生成一个Pattern对象，一般使用形式如下：
re.compile(pattern, [flag])
其中，pattern是一个字符串形式的正则表达式，flag是一个可选参数，表示匹配模式，比如忽略大小写，多行模式等
eg：
pattern = re.compile(r’\d+')
在上面，我们已将一个正则表达式编译成Pattern对象，接下来，我们就可以利用pattern的一系列方法对文本进行匹配查找了，
Pattern对象的一些常用方法，findall,finditer,search,match,sub,subn,split
1. re.findall()
  在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配内容，则返回空列表
  
  r = re.compile(r’\w*[a]\w*‘)
  m = r.findall(s)
  [‘can’, ‘dreams’, ‘can’, ‘great’, ‘can’, ‘mean’, ‘a’, ‘a’, ‘address’]
  或者：
  m = re.findall(r’\w*[a]\w*', s)
  print(m)
  [‘can’, ‘dreams’, ‘can’, ‘great’, ‘can’, ‘mean’, ‘a’, ‘a’, ‘address’]
2. re.finditer()
  和findall类似，在字符串找到正则表达式所匹配的所有子串，并把他们作为一个迭代器返回。
  
  r = re.compile(r’\w*[a]\w*‘)
  m2 = r.finditer(s) #m2 是一个迭代器 <callable_iterator object at 0x0000022F012B22C8>
  for m in m2:
  print(m.group())
  can
  dreams
  can
  great
  can
  mean
  a
  a
  address
  或者：
  m2 = re.finditer(r’\w*[a]\w*', s)
  for m in m2:
  print(m.group())
  输出结果同上
3. re.search()
  扫描整个字符串并返回第一个成功的匹配，如果没有匹配，就返回None
  
  r = re.compile(r’\w*[a]\w*‘)
  m = r.search(s)
  print(m.group())
  can
  或者：
  m = re.search(r’\w*[a]\w*', s)
  print(m.group())
  can
4. re.match()
  从字符串的其实位置匹配，匹配成功返回一个匹配的对象，匹配失败，返回None
  
  m = r.match(s)
  print(m)
  None
  
  如果 s = 'happy life can be ’
  
  m = r.match(s)
  print(m)
  print(m.group())
  <re.Match object; span=(0, 5), match=‘happy’>
  happy
  m = re.math(r’\w*[a]\w*', s)
  print(m.group())
  happy
5. re.sub()
  sub是substitute的缩写，表示替换，将匹配到的数据进行替换
  语法：re.sub(pattern, repl, string, count=0, flags=0)
  参数描述：
  pattern: 必选，表示曾泽中的模式字符串
  repl：必选，就是replacement,要替换的字符串，也可以是一个函数（这个函数需返回一个字符串）
  string: 必选，被替换的那个字符串
  count：可选参数，count 是要替换字符串的最大次数，必须是非负整数。如果省略这个参数或者设为
  0，所有的匹配都会被替换掉
  flag: 可选参数，标志位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等
  
  m = r.sub(“may”, s, count=1) # 将匹配到的数据只替换一处
  print(m)
  life may be dreams,life can be great thoughts,life can mean a person, sitting in a court,address
  或者：
  m = re.sub(r’\w*[a]\w*', ‘may’, s, count=2) # count=2,所以代表替换2处
  print(m)
  life may be may,life can be great thoughts,life can mean a person, sitting in a court,address
6. re.subn()
  行为与sub()相同，但是返回一个元组（字符串，替换次数）
  
  r = re.compile(r’\w*[a]\w*‘)
  m = r.subn(‘may’, s, count=2)
  print(m)
  (‘life may be may,life can be great thoughts,life can mean a person, sitting in a court,address’, 2)
  或者：
  m = re.subn(r’\w*[a]\w*', ‘may’, s, count=1)
  print(m)
  (‘life may be dreams,life can be great thoughts,life can mean a person, sitting in a court,address’, 1)
7. re.split()
  根据匹配进行切割字符串，并返回一个列表。
  
  r = re.compile(r’\w*[a]\w*‘)
  m = r.split(s, maxsplit=1)
  print(m)
  [‘life ‘, ’ be dreams,life can be great thoughts,life can mean a person, sitting in a court,address’]
  或者：
  m = re.split(r’\w*[a]\w*’, s)
  print(m)
  或者：
  m = re.split(r’\w*[a]\w*', s) # 没设置maxsplit值，默认为0，不限次数分割
  print(m)
  ['life ', ’ be ', ',life ', ’ be ', ’ thoughts,life ', ’ ', ’ ', ’ person, sitting in ‘, ’ court,’, ‘’]
python中的贪婪和非贪婪
python里数量词默认是贪婪的（少数语言里也可以是默认非贪婪的），总是尝试匹配尽可能多的字符；非贪婪则相反，总是尝试匹配尽可能少的字符。
例如：

s = ‘abbbc’
res = re.findall(r’ab*‘, s) # 默认贪婪模式
print(res)
[‘abbb’]
res = re.findall(r’ab*?’, s) # *后面加上？，则变为非贪婪模式
print(res)
[‘a’]

注：我们一般使用非贪婪模式来提取
在 “*”，“？”，“+”，“{m,n}”后面加上？，使贪婪变成非贪婪