python里使用正则表达式的后向搜索否定模式

最新推荐文章于 2022-12-13 23:04:49 发布

caimouse

最新推荐文章于 2022-12-13 23:04:49 发布

阅读量2.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： milang(小语）文章标签： python tensorflow 正则表达式

本文链接：https://blog.youkuaiyun.com/caimouse/article/details/78481084

milang(小语）专栏收录该内容

389 篇文章

订阅专栏

本文介绍正则表达式中的后向搜索否定模式，通过实例演示如何使用该模式排除特定模式的匹配，例如过滤掉noreply的电子邮件地址。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在前面学习了前向搜索的否定模式，其实也存在后向搜索的否定模式，就是说先把字符串匹配过后，再来回过头去判断是否不需要的字符串，它的语法是这样： (?<!pattern)。由于这个语法是搜索前面已经匹配的字符串，所以必须是固定长度的字符串，并且不能是组索引，这与前向搜索有区别的。例子如下：

#python 3.6
#蔡军生 
#http://blog.youkuaiyun.com/caimouse/article/details/51749579
#
import re

address = re.compile(
    '''
    ^

    # An address: username@domain.tld

    [\w\d.+-]+       # username

    # Ignore noreply addresses
    (?<!noreply)

    @
    ([\w\d.]+\.)+    # domain name prefix
    (com|org|edu)    # limit the allowed top-level domains

    $
    ''',
    re.VERBOSE)

candidates = [
    u'first.last@example.com',
    u'noreply@example.com',
]

for candidate in candidates:
    print('Candidate:', candidate)
    match = address.search(candidate)
    if match:
        print('  Match:', candidate[match.start():match.end()])
    else:
        print('  No match')

结果输出如下：
Candidate: first.last@example.com
Match: first.last@example.com
Candidate: noreply@example.com
No match

在这里同样实现识别不作回复的EMAIL地址。