Python Regular Expression——简介

最新推荐文章于 2025-07-10 21:22:44 发布

TheMinority

最新推荐文章于 2025-07-10 21:22:44 发布

阅读量4.3k

点赞数

CC 4.0 BY-SA版权

分类专栏：码农生活文章标签： python 正则表达式 string regex object list

本文链接：https://blog.youkuaiyun.com/TheMinority/article/details/7629227

码农生活专栏收录该内容

5 篇文章

订阅专栏

本文详细介绍了正则表达式的基本符号和用法，以及Python的re模块提供的核心函数和方法。通过示例展示了如何使用这些工具进行字符串匹配、搜索、查找、分割和替换操作，并提供了具体应用实例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 正则表达式使用的特殊符号和字符

记号	说明	正则表达式样例
literal	匹配字符串的值	foo
re1 \| re2	匹配正则表达式re1或re2	foo \| bar
.	匹配任何字符（换行符除外）	b.b
^	匹配字符串的开始	^Dear
$	匹配字符串的结尾	/bin/*sh$
*	匹配前面出现的正则表达式零次或多次	[A-Za-z0-9]*
+	匹配前面出现的正则表达式一次或多次	[a-z]+\.com
?	匹配前面出现的正则表达式零次或一次	goo?
{N}	匹配前面出现的正则表达式N次	[0-9]{3}
{M,N}	匹配重复出现M次到N次的正则表达式	[0-9]{5,9}
[…]	匹配字符组里出现的任意一个字符	[aeiou]
[..x-y..]	匹配从字符x到y中出现的任意一个字符	[0-9], [A-Za-z]
[^…]	不匹配此字符集中出现的任何一个字符，包括某一范围的字符（如果在此字符集中出现）	[^aeiou], [^A-Za-z0-9_]
(* \| + \| ? \| {} )?	用于上面出现的任何“非贪婪”。版本重复匹配次数符号	.*?[a-z]
(…)	匹配封闭括号中正则表达式（RE），并保存为子组	([0-9]{3})?, f(oo\|u)bar
特殊字符
\d	匹配任何数字，和[0-9]一样（\D是\d的反义：任何非数字符）	data\d.txt
\w	匹配任何数字字母字符，和[A-Za-z0-9_]相同（\W是\w的反义）	[A-Za-z_]\w+
\s	匹配任何空白符，和[\n\t\r\v\f]相同，（\S是\s的反义）	of\sthe
\b	匹配单词边界（\B是\b的反义）	\bThe\b
\nn	匹配已保存的子组（请参考上面的正则表达式符号：(…)）	price:\16
\c	逐一匹配特殊字符c（即，取消它的特殊含义，按字面匹配）	\., \\, \*
\A (\Z)	匹配字符串的起始（结束）	\ADear

2. Python 的re模块：核心函数和方法

模块的函数

compile(pattern, flags=0)

compile RE pattern with any optional flags and return a regex object

re模块的函数和regex对象的方法

match(pattern, string, flags=0)	attempt to match RE pattern to string with optional flags; return match object on success, None on failure
search(pattern, string, flags=0)	search for first occurrence of RE pattern within string with optional flags; return match object on success, None on failure
findall(pattern, string)	look for all (non-overlapping) occurrences of pattern in string; return a list of matches (new as of Python 1.5.2)
split(pattern, string, max=0)	split string into a list according to RE pattern delimiter and return list of successful matches, splitting at most max times (split all occurrences is the default)
sub(pattern, repl, string, max=0)	replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided (also see subn() which, in addition, returns the number of substitutions made)

匹配对象的方法

group(num=0)	return entire match (or specific subgroup num)
groups()	return all matching subgroups in a tuple (empty if there weren't any)

3. 正则表达式的用法

待过滤字符串文件

Tue Nov  7 18:38:48 1995::kmfxps@vixmsfphpvxh.gov::815740728-6-12
Mon Dec 12 15:09:50 1977::goprf@pivcuqfecxxf.edu::250758590-5-12
Thu Apr 13 15:40:25 1972::uqkrtf@dqtnunm.com::71998825-6-7
Mon Jan 31 07:58:42 1994::knqz@cofegju.edu::759974322-4-7
Mon Nov 16 12:30:48 1970::edlprre@omwmoaqhqjb.net::27577848-7-11
Fri Jul  5 01:07:39 1996::uhjwm@gffttoky.edu::836500059-5-8

Python脚本

import re

for line in open('data.log'):
#       result = re.findall('.*(h).*(h).*', line)
        line = line.rstrip()
#       result = re.split('::|\n', line)
#       result = re.match('^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)', line)
#       result = re.search('.+?(\d+-\d+-\d+)', line)
        result = re.search('-(\d)-', line)
#       if len(result) != 0:
        if result is not None:
        #       print result
#               print result.group()
                print result.group(1)

注释中的语句为re模块的一些常见用法