CH7-python快速编程上手-正则表达式

最新推荐文章于 2024-06-24 13:33:44 发布

原创最新推荐文章于 2024-06-24 13:33:44 发布 · 285 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#正则表达式

python3 专栏收录该内容

17 篇文章

订阅专栏

本文介绍正则表达式的使用方法，包括基本语法、特殊字符、重复模式等，并演示如何使用Python进行文本匹配、搜索和替换。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

用正则表达式查找文本模式

1)正则表达式匹配练习
1.用Import re导入正则表达式模块
2.用re.copile()函数创建一个Regex对象（记得使用原始字符串）
3.向Regex对象的search（）方法传入想查找的字符串返回一个Match对象
4.调用Match对象的group()方法返回实际匹配文本的字符串

 import re
 phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
 mo = phoneNumRegex.search('My number is 415-555-2456')
 print('Phone number found:' +mo.group())
Phone number found:415-555-2456
 mo.group(1)
'415'
 mo.group(2)
'555-2456'

以上利用括号进行分组

想要一次获得所有的分组，使用groups()方法注意函数名的复数形式

>>> mo.groups()
('415', '555-2456')

若正则表达式中有特殊的含义 +转义符号

>>> phoneNumRegex = re.compile(r'\((\d\d\d\))-(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is (415)-555-2456')
>>> print('Phone number found:' +mo.group())
Phone number found:(415)-555-2456

2)用管道匹配多个分组
|被称为管道希望匹配中许多表达式中的一个时可以使用它表示 ‘或’

>>> heroRegex = re.compile(r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Batman and Tina Fey.')
>>> mo1.group()
'Batman '

>>> heroRegex = re.compile(r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Tina Fey and Batman.')
>>> mo1.group()
'Tina Fey'

也可以使用管道符来匹配多个模式中的一个，作为正则表达式的一部分。例如

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = batRegex.search('Batmobile lost a wheel')
>>> mo.group()
'Batmobile'
>>> mo.group(1)
'mobile'

mo.group(1)只返回第一个括号内匹配的文本‘mobile’通过使用管道符合分组括号指定几种可选的模式让正则表达式去匹配。

3)用问号实现可选匹配
?字符表明前面的分组在这个模式中是可选的例如，在交互式环境中输入以下代码

>>> batRegex = re.compile(r'Bat(wo)?man')
>>> mo = batRegex.search('The adventures of Batman')
>>> mo.group()
'Batman'
>>> mo1 = batRegex.search('The adventures of Batwoman')
>>> mo1.group()
'Batwoman'

wo是可选的部分 wo将出现0次或者1次

例如电话号码的：

>>> phoneNumRegex = re.compile(r'(\d\d\d-)?(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is 555-2456')
>>> mo.group()
'555-2456'

4)用*号匹配0次或多次

>>> batRegex = re.compile(r'Bat(wo)*man')
>>> mo = batRegex.search('The adventures of Batman')
>>> mo.group()
'Batman'
>>> mo = batRegex.search('The adventures of Batwoman')
>>> mo.group()
'Batwoman'
>>> mo = batRegex.search('The adventures of Batwowowowoman')
>>> mo.group()
'Batwowowowoman'

5)加号匹配一次或多次

>>> batRegex = re.compile(r'Bat(wo)+man')
>>> mo = batRegex.search('The adventures of Batwoman')
>>> mo.group()
'Batwoman'
>>> mo = batRegex.search('The adventures of Batman')
>>> mo == None
True

6)用花括号限定次数
(Ha){3,5}表示匹配’HaHaHa’,’HaHaHaHa’,’HaHaHaHaHa’

(Ha){3}
(Ha)(Ha){Ha}

>>> haRegex = re.compile(r'(Ha){3}')
>>> mo = haRegex.search('HaHaHa')
>>> mo.group()
'HaHaHa'
>>> mo1 = haRegex.search('Ha')
>>>> mo1 == None
True

7.4默认python贪心匹配
尽量次数多的匹配（花括号中出现）

7.5findall()方法
作为findall()方法的返回结果的总结，请记住以下两点：
1.如果调用在一个没有分组的正则表达式上，将返回一个匹配字符的列表
2.如果调用在一个有分组的正则表达式上将返回一个字符串的元组的列表

>>> phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> phoneNumRegex.findall('cell:415-555-9878 work: 415-555-2456')
[('415', '555-9878'), ('415', '555-2456')]
>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> phoneNumRegex.findall('cell:415-555-9878 work: 415-555-2456')
['415-555-9878', '415-555-2456']

建立自己的字符分类：

>>> import re
>>> vowelRegex = re.compile(r'[aEiosnvOU]')
>>> vowelRegex.findall('RoBOCup eats baby food.BABAY FOOD')
['o', 'O', 'a', 's', 'a', 'o', 'o', 'O', 'O']

插入符号和美元符号：
^表示匹配发生在开始处
$表示匹配发生在结束处
如果使用了这两个那么整个字符串必须匹配该正则表达式

>>> WithHello = re.compile(r'^Hello')
>>> WithHello.search(' Hello World!') == None
True

>>> wholeStringisnum = re.compile(r'^\d+$')
>>> wholeStringisnum.search('123456789+')
>>> wholeStringisnum.search('1234567891')
<_sre.SRE_Match object; span=(0, 10), match='1234567891'>
>>> wholeStringisnum.search('123456789+').group()

用句点符号换行
re.DOTALL作为re.compile()的第二个参数让句点字符匹配所有字符包括换行符

>>> import re
>>> Regex = re.compile('.*')
>>> Regex.search('serve the public trust.\nProtect the innocent.\nUphold the law.\n').group()
'serve the public trust.'
>>> Regex = re.compile('.*',re.DOTALL)
>>> Regex.search('serve the public trust.\nProtect the innocent.\nUphold the law.\n').group()
'serve the public trust.\nProtect the innocent.\nUphold the law.\n'

不区分大小写的匹配

>>> Regex = re.compile(r'robocop',re.I)
>>> Regex.search('ROBOCOP protects the innocent.').group()
'ROBOCOP'

用sub()方法替换字符串

>>> Regex = re.compile(r'Agent \w+')
>>> Regex.sub('Alex','Agent Alice gave the secret to Agent Bob')
'Alex gave the secret to Alex'

管理复杂的正则表达式
想re.compile()传入第二个参数re.VERBOSE

>>> phoneRegex = re.compile(r'''(
...     (\d{3}|\(\d{3}\))?
...     (\s|-|\.)?
...     \d{3}
...     (\s|-|\.)
...     \d{4}
...     (\s*(ext|x|ext.)\s*\d{2,5})?
...     )''',re.VERBOSE)

项目：电话号码和E-MAIL地址提取程序

#! python3
import pyperclip,re

phoneRegex = re.compile(r'''(
   (\d{3}|\(\d{3}\))?
   (\s|-|\.)?
   (\d{3})
   (\s|-|\.)
   (\d{4})
   (\s*(ext|x|ext\.)\s*(\d{2,5}))?
)''',re.VERBOSE)

#CREATE email regex
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+
    @
    [a-zA-Z0-9.-]+
    (\.[A-Za-z]{2,4})
    )''',re.VERBOSE)
#FIND MATCHES IN clipboard text
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
       phoneNum = '-'.join([groups[1],groups[3],groups[5]])
       if groups[8] != '':
            phoneNum += ' x' +groups[8]
       matches.append(phoneNum)
for groups in emailRegex.findall(text):
        matches.append(groups[0])

#copy results to the clipboard
        if len(matches) > 0:
             pyperclip.copy('\n'.join(matches))
             print('Copied to clipboard:')
             print('\n'.join(matches))
        else:
             print('No phone numbers or email address found.')

参考链接