第一章：文本-re:正则表达式-搜索选项(3)_re模块怎么匹配文本含有ascll编码的内容-优快云博客

本文探讨了Python3中正则表达式的Unicode处理方式，对比了ASCII与Unicode模式的区别，通过实例展示了如何在不同模式下进行文本匹配。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.3.7.3 Unicode
在python3中，str对象使用完整的Unicode字符集，str的正则表达式处理会假设模式和输入文本都是Unicode.之前描述的转义码默认的也是按Unicode定义。这些假设意味着模式\w+对单词"French"和"Français"都能匹配。要向python2中的默认假设那样将转义码限制到ASCII字符集，编译模式或者调用模块级函数search()和match()时要使用ASCII码。

import re

text = u'Français österreich'
pattern = r'\w+'
ascii_pattern = re.compile(pattern,re.ASCII)
unicode_pattern = re.compile(pattern)

print('Text    :',text)
print('Pattern :',pattern)
print('ASCII   :',list(ascii_pattern.findall(text)))
print('Unicode :',list(unicode_pattern.findall(text)))