Python学习4-7.4-7.11正则表达式查找文本

最新推荐文章于 2025-12-05 17:02:52 发布

原创最新推荐文章于 2025-12-05 17:02:52 发布 · 282 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python

本文深入探讨Python正则表达式的高级应用，包括贪心与非贪心匹配、findall方法、字符分类、通配符及特殊字符的使用，通过实例讲解如何进行高效文本搜索与处理。

Python学习4-7.4-7.11正则表达式查找文本

7.4 贪心和非贪心匹配
7.5 findall()方法
7.6 字符分类
7.7 建立自己的字符分类
7.8 插入字符和美元字符
7.9 通配字符
- 7.9.1 用点-星匹配所有字符
7.9.2 用句点字符匹配换行
7.11 不区分大小写的匹配
内容来源

本文为学习python编程时所记录的笔记，仅供学习交流使用。

7.4 贪心和非贪心匹配

python正则表达式默认是贪心的，非贪心需要在花括号后边跟一个问号

>>> import re
>>> greedyHaRegex=re.compile(r'(Ha){3,5}')
>>> mo1=greedyHaRegex.search('HaHaHaHaHa')
>>> mo1.group()
'HaHaHaHaHa'
>>> nongreedyHaRegex=re.compile(r'(Ha){3,5}?')
>>> mo2=nongreedyHaRegex.search('HaHaHaHaHa')
>>> mo2.group()
'HaHaHa'

7.5 findall()方法

search的match对象只包含第一次出现的匹配文本

>>> phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> mo=phoneNumRegex.search('Cell:415-555-9999,work:215-555-0000')
>>> mo.group()
'415-555-9999'

findall返回一个字符串列表

>>> phoneNumRegex=re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') #has no groups
>>> phoneNumRegex.findall('Cell:415-555-9999,work:215-555-0000')
['415-555-9999', '215-555-0000']

如果正则表达式中有分组，则返回元组列表

>>> phoneNumRegex=re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)') #has groups
>>> phoneNumRegex.findall('Cell:415-555-9999 work:212-555-0000')
[('415', '555', '9999'), ('212', '555', '0000')]

7.6 字符分类

缩写字符分类	表示
\d	0到9的任何数字
\D	除0到9数字以外的任何字符
\w	任何字母、数字或下划线字符（可以认为是匹配单词字符）
\W	除字母、数字或下划线以外的任何字符
\s	空格、制表符或换行符（可以认为是匹配空白的字符）
\S	除空格、制表符或换行符以外的任何字符

>>> xmasRegex=re.compile(r'\d+\s\w+')
>>> xmasRegex.findall('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings,4 birds,3 hen,2 doves,1 partridge')
['12 drummers', '11 pipers', '10 lords', '9 ladies', '8 maids', '7 swans', '6 geese', '5 rings', '4 birds', '3 hen', '2 doves', '1 partridge']

7.7 建立自己的字符分类

可以用方括号定义自己的字符分类

>>> vowelRegex=re.compile(r'[aeiouAEIOU]')
>>> vowelRegex.findall('RObocop eats baby food. Baby FOOD.')
['O', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'a', 'O', 'O']

字符分类[a-zA-Z0-9]将匹配所有小写字母、大写字母和数字

>>> allAlphbtaRegex=re.compile(r'[a-zA-C]')
>>> allAlphbtaRegex.findall('What is your name? AHOW')
['h', 'a', 't', 'i', 's', 'y', 'o', 'u', 'r', 'n', 'a', 'm', 'e', 'A']

在字符分类的左方括号后加上一个插入字符^，就得到非此字符类

>>> consonantRegex=re.compile(r'[^aeiouAEIOU]')
>>> consonantRegex.findall('Roboco eats baby food.BABY FOOD.')
['R', 'b', 'c', ' ', 't', 's', ' ', 'b', 'b', 'y', ' ', 'f', 'd', '.', 'B', 'B', 'Y', ' ', 'F', 'D', '.']

7.8 插入字符和美元字符

字符^开头是以此为开始

>>> beginsWithHello=re.compile(r'^Hello')`>>> atRegex=re.compile(r'.at')
>>> atRegex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']`
>>> beginsWithHello.search('Hello world!')
<re.Match object; span=(0, 5), match='Hello'>
>>> beginsWithHello.search('He said Hello.')==None
True

正则表达式r’\d$'匹配以数字0-9结束的字符串

>>> endsWithNumber=re.compile(r'\d$')
>>> endsWithNumber.search('Your number 007 is 42')
<re.Match object; span=(20, 21), match='2'>
>>> endsWithNumber.search('Your number 007 is forty two,')==None
True

正则表达式r’^\d+$'匹配从开始到结束都是数字的字符串

>>> wholeStringIsNum=re.compile(r'^\d+$')
>>> wholeStringIsNum.search('1234567890')
<re.Match object; span=(0, 10), match='1234567890'>
>>> wholeStringIsNum.search('123456xyz')==None
True
>>> wholeStringIsNum.search('123456 789')==None
True

7.9 通配字符

句点.字符成为通配符，匹配除换行外的所有字符句点只能匹配一个字符

>>> atRegex=re.compile(r'.at')
>>> atRegex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']

7.9.1 用点-星匹配所有字符

>>> nameRegex=re.compile(r'First Name:(.*) Last Name:(.*)')
>>> mo=nameRegex.search('First Name:AL Last Name:Sweigart')
>>> mo.group(1)
'AL'
>>> mo.group(2)
'Sweigart'

>>> nongreedyRegex=re.compile(r'<.*?>')
>>> mo=nongreedyRegex.search('<To server a man>for dinner.')
>>> mo.group()
'<To server a man>'

>>> greedyRegex=re.compile(r'<.*>')
>>> mo=greedyRegex.search('<To server a man>for dinner.>')
>>> mo.group()
'<To server a man>for dinner.>'

7.9.2 用句点字符匹配换行

通过传入re.DOTALL作为re.compile的第二个参数，可以让句点匹配所有字符，包括换行

>>> noNewlineRegex=re.compile('.*')
>>> noNewlineRegex.search('server the public trust.\nprotect the innocent.\n uphold the law.').group()
'server the public trust.'

>>> newlineRegex=re.compile('.*',re.DOTALL)
>>> newlineRegex.search('server the public trust.\nprotect the innocent.\n uphold the law')
<re.Match object; span=(0, 62), match='server the public trust.\nprotect the innocent.\n>

7.11 不区分大小写的匹配

可向re.compile传入re.IGNORECASE或re.I，作为第二个参数

>>> robocop=re.compile(r'robocop',re.I)
>>> robocop.search('RoBocop is part man,part machine,all cop.').group()
'RoBocop'

内容来源

[1] [美]斯维加特(Al Sweigart).Python编程快速上手——让繁琐工作自动化[M]. 王海鹏译.北京：人民邮电出版社，2016.7.p123-129