Python3中的正则表达式1

最新推荐文章于 2025-01-25 15:24:09 发布

原创最新推荐文章于 2025-01-25 15:24:09 发布 · 249 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python中的正则表达式

Python 专栏收录该内容

5 篇文章

订阅专栏

本文深入解析正则表达式的概念、使用场景及Python中re模块的高级应用，包括match、search、findall等函数详解，助您快速掌握文本处理技巧。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、什么是正则表达式？

正则表达式也叫做模式匹配，它是由一组特定含义的字符串组成，通常用于匹配和替换文本。

正则表达式并不是Python的一部分。它拥有自己独特的语法和独立的引擎。

2、为什么要使用正则表达式？

简化开发过程中的复杂度和提高开发效率。

3、正则表达式的匹配流程：

依次拿出表达式和文本中的字符比较，如果每一个字符都能匹配，则匹配成功；一档匹配到不成功的字符串，则匹配失败。

4、python为了方便对正则的使用，引入了re模块：

re模块提供对正则表达式的支持。re模块提供了与这些方法功能完全一致的函数，这些函数使用一个模式字符串作为他们的第一个参数。

>>> import re
>>> dir(re)
['A', 'ASCII', 'DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'RegexFlag', 'S', 'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X', '_MAXCACHE', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__version__', '_alphanum_bytes', '_alphanum_str', '_cache', '_compile', '_compile_repl', '_expand', '_locale', '_pattern_type', '_pickle', '_subx', 'compile', 'copyreg', 'enum', 'error', 'escape', 'findall', 'finditer', 'fullmatch', 'functools', 'match', 'purge', 'search', 'split', 'sre_compile', 'sre_parse', 'sub', 'subn', 'template']

re.match()尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()则返回None

match(pattern, string, flags=0)
    Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found.

我们可以使用group(num)或groups()匹配对象函数来获取匹配表达式。

>>> match=re.match("www","www.baidu.com")
>>> match.group()
'www'

re.search()扫描整个字符串并返回第一个成功的匹配。

search(pattern, string, flags=0)
    Scan through string looking for a match to the pattern, returning
    a match object, or None if no match was found.

>>> match=re.search("www","www.baidu.com.www")
>>> match.group()
'www'

search()和match的区别：

match只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，返回None。

search匹配整个字符串知道找到一个匹配。

>>> re.match("www","baidu.com.www")
>>> re.search("www","baidu.com.www")
<_sre.SRE_Match object; span=(10, 13), match='www'>

sub()用于替换字符串中的匹配项。

sub(pattern, repl, string, count=0, flags=0)

参数：
pattern : 正则中的模式字符串。
repl : 替换的字符串，也可为一个函数。
string : 要被查找替换的原始字符串。
count : 模式匹配后替换的最大次数，默认 0 表示替换所有的匹配。
flags : 编译时用的匹配模式，数字形式。
前三个为必选参数，后两个为可选参数。

例：
>>> re.sub("#","","87+13498727431 #这是一个电话号码")
'87+13498727431 这是一个电话号码'

compile()用于编译正则表达式，生成一个正则表达式（pattern）对象，供match()和search()使用。

compile(pattern, flags=0)
    Compile a regular expression pattern, returning a pattern object.

findall()找到正则表达式所匹配的所有字符串，并返回一个列表，如果没有找到，则返回空列表。

match()和search()是匹配一次，findall()是匹配所有。

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

finditer()在字符串中找到正则表达式所匹配的所有字符串，并把它们作为一个迭代器返回。

finditer(pattern, string, flags=0)
    Return an iterator over all non-overlapping matches in the
    string.  For each match, the iterator returns a match object.

    Empty matches are included in the result.

re.split()按照能够匹配的字符串将字符串分割后返回列表

split(pattern, string, maxsplit=0, flags=0)
参数：
pattern	   # 匹配的正则表达式
string	   # 要匹配的字符串。
maxsplit   # 分隔次数，maxsplit=1 分隔一次，默认为 0，不限制次数。
flags	   # 标志位，用于控制正则表达式的匹配方式，如：是否区分大小写，多行匹配等等。

5、元字符

\w与\W

>>> re.match("\w","0xad29c320")    #字符串第一个字符是否为有效字符
<_sre.SRE_Match object; span=(0, 1), match='0'>
>>> re.match("\w","+0xad29c320") 
>>> re.match("\W","+0xad29c320")   #字符串第一个字符是非有效字符，与\w正好相反
<_sre.SRE_Match object; span=(0, 1), match='+'>

\s与\S

>>> re.match("\s"," hello python")
<_sre.SRE_Match object; span=(0, 1), match=' '>
>>> re.match("\S"," hello python")

\d与\D

>>> re.match("\d","my age is 13")
>>> re.match("\D","my age is 13")
<_sre.SRE_Match object; span=(0, 1), match='m'>
>>> re.match("\d","2019年是建国70周年")         #\d代表匹配一个数字
<_sre.SRE_Match object; span=(0, 1), match='2'>
>>> re.match("\d\d","2019年是建国70周年")
<_sre.SRE_Match object; span=(0, 2), match='20'>
>>> re.findall("\d","2019年是建国70周年")
['2', '0', '1', '9', '7', '0']

^与$

>>> re.match("^i am","i am zhang")    #以i am开头
<_sre.SRE_Match object; span=(0, 4), match='i am'>
>>> re.match(".*g$","i am zhang")     #以g结尾
<_sre.SRE_Match object; span=(0, 10), match='i am zhang'>
>>> re.match("^i am.*g$","i am zhang")   #以i am开头，并且以g结尾
<_sre.SRE_Match object; span=(0, 10), match='i am zhang'>

[]

>>> re.match("[0-9]","0xa543cbf0")    #匹配是否以数字开头
<_sre.SRE_Match object; span=(0, 1), match='0'>

6、字符转义

比如你要查找.,或者*,就会出现问题，因为.和*是有特殊含义的，他们会被解释成其他意思，这是就需要\来取消这些字符的特殊意义。例如：\.和\*，当然要查找\本身就需要用\\来表示。

>>> re.match("c:\\","c:\\a\\b")      #错误，因为\\转义后就变成了\

>>> re.match("c:\\\\","c:\\a\\b")    #正确，
<_sre.SRE_Match object; span=(0, 3), match='c:\\'>
>>> re.match(r"c:\\","c:\\a\\b")     #正确，r表示匹配的字符不进行转义
<_sre.SRE_Match object; span=(0, 3), match='c:\\'>

7、重复

例：

#	+	#匹配一到多个
			例：
			>>> re.findall("\d+","我今年0岁，马上出生就是1岁")
			['0', '1']
#	?	#匹配0到1个
#	{m} 	# 表示m位
			例：
			>>> re.findall("\d{4}","我今年0岁，马上出生就是1岁,今年是2019年，建国70周年了")
			['2019']
#	{m,}	#表示至少m位
			例：
			>>> re.findall(r"\d{2,}","我今年0岁，马上出生就是1岁,今年是2019年，建国70周年了")
			['2019', '70']
#	{m,n}	#最少m位，最多n位
			例：
			>>> re.findall(r"\d{1,4}","我今年0岁，马上出生就是1岁,今年是2019年，建国70周年了")
			['0', '1', '2019', '70']

练习：匹配是否为电话号码：

>>> re.match(r"^1[3456789]\d{9}","13457824875")
<_sre.SRE_Match object; span=(0, 11), match='13457824875'>

练习：匹配出python的邮箱，且@之前有4到20个字符，如：123456@python.com>>>

re.findall("\w{4,20}@python.com$","zhang_0@python.com")
['zhang_0@python.com']
>>> re.findall("\w{4,20}@python.com$","zha@python.com")
[]