Python正则表达式实战练习详解
正则表达式是文本处理中非常强大的工具,掌握它可以大幅提升文本处理效率。本文基于Python正则表达式练习项目,通过一系列精心设计的练习题,帮助读者逐步掌握正则表达式的核心概念和应用技巧。
基础入门练习
1. 十六进制数值检测
检测字符串中是否包含0xB0这样的十六进制数值:
import re
line1 = 'start address: 0xA0, func1 address: 0xC0'
line2 = 'end address: 0xFF, func2 address: 0xB0'
print(bool(re.search(r'0xB0', line1))) # False
print(bool(re.search(r'0xB0', line2))) # True
2. 数字替换
将字符串中所有数字5替换为five:
ip = 'They ate 5 apples and 5 oranges'
print(re.sub(r'5', 'five', ip))
# 输出:'They ate five apples and five oranges'
3. 首次匹配替换
仅替换字符串中第一次出现的数字5:
ip = 'They ate 5 apples and 5 oranges'
print(re.sub(r'5', 'five', ip, count=1))
# 输出:'They ate five apples and 5 oranges'
4. 过滤不含特定字符的元素
过滤列表中不包含字母e的元素:
items = ['goal', 'new', 'user', 'sit', 'eat', 'dinner']
print([w for w in items if not re.search(r'e', w)])
# 输出:['goal', 'sit']
进阶锚点练习
1. 字符串起始检测
检查字符串是否以be开头:
line1 = 'be nice'
line2 = '"best!"'
line3 = 'better?'
line4 = 'oh no\nbear spotted'
pat = re.compile(r'^be')
print(bool(pat.search(line1))) # True
print(bool(pat.search(line2))) # False
print(bool(pat.search(line3))) # True
print(bool(pat.search(line4))) # False
2. 全词匹配替换
仅替换整个单词red为brown:
words = 'bred red spread credible red.'
print(re.sub(r'\bred\b', 'brown', words))
# 输出:'bred brown spread credible brown.'
3. 数字周围字符检测
过滤列表中包含被单词字符包围的42的元素:
words = ['hi42bye', 'nice1423', 'bad42', 'cool_42a', '42fake', '_42_']
print([w for w in words if re.search(r'\B42\B', w)])
# 输出:['hi42bye', 'nice1423', 'cool_42a', '_42_']
分组与选择练习
1. 多条件过滤
过滤以den开头或以ly结尾的元素:
items = ['lovely', '1\ndentist', '2 lonely', 'eden', 'fly\n', 'dent']
print([e for e in items if re.search(r'^den|ly$', e)])
# 输出:['lovely', '2 lonely', 'dent']
2. 多模式替换
替换多种模式为X:
s1 = 'creed refuse removed read'
s2 = 'refused reed redo received'
pat = re.compile(r're(mov|ceiv|fus|)ed|reed')
print(pat.sub('X', s1)) # 'cX refuse X read'
print(pat.sub('X', s2)) # 'X X redo X'
元字符转义练习
1. 特殊字符处理
替换特定模式而不影响其他部分:
str1 = '(9-2)*5+qty/3-(9-2)*7'
str2 = '(qty+4)/2-(9-2)*5+pq/4'
pat = re.compile(r'\(9-2\)\*5')
print(pat.sub('35', str1)) # '35+qty/3-(9-2)*7'
print(pat.sub('35', str2)) # '(qty+4)/2-35+pq/4'
2. 边界条件替换
仅在字符串开始或结尾处替换特定模式:
s1 = r'2.3/(4)\|6 foo 5.3-(4)\|'
s2 = r'(4)\|42 - (4)\|3'
s3 = 'two - (4)\\|\n'
pat = re.compile(r'^(\(4\)\\\||\(4\)\\\|$)')
print(pat.sub('2', s1)) # '2.3/(4)\\|6 foo 5.3-2'
print(pat.sub('2', s2)) # '242 - (4)\\|3'
print(pat.sub('2', s3)) # 'two - (4)\\|\n'
量词与贪婪匹配
1. 贪婪与非贪婪匹配
理解贪婪匹配的行为:
ip = 'a<apple> 1<> b<bye> 2<> c<cat>'
# 错误示例
print(re.sub(r'<.+?>', '', ip)) # 输出:'a 1 2'
# 正确做法应使用
print(re.sub(r'<[^>]+>', '', ip)) # 输出:'a 1<> b 2<> c'
2. 量词等价表示
理解基本量词的等价形式:
?等价于{0,1}*等价于{0,}+等价于{1,}
匹配部分处理
1. 范围匹配提取
提取从第一个is到最后一个t之间的内容:
str1 = 'This the biggest fruit you have seen?'
str2 = 'Your mission is to read and practice consistently'
pat = re.compile(r'is.*t')
print(pat.search(str1).group()) # 'is the biggest fruit'
print(pat.search(str2).group()) # 'ission is to read and practice consistent'
2. 多模式首次出现位置
查找多个模式的首次出现位置:
s1 = 'match after the last newline character'
s2 = 'and then you want to test'
s3 = 'this is good bye then'
s4 = 'who was there to see?'
pat = re.compile(r'is|the|was|to')
print(pat.search(s1).start()) # 12
print(pat.search(s2).start()) # 4
print(pat.search(s3).start()) # 2
print(pat.search(s4).start()) # 4
通过以上系统化的练习,读者可以逐步掌握Python正则表达式的核心功能和应用场景。建议读者实际运行这些代码示例,并尝试修改模式以观察不同效果,从而加深对正则表达式工作原理的理解。
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



