1. 正则表达式使用的特殊符号和字符
记号 |
说明 |
正则表达式样例 |
literal |
匹配字符串的值 |
foo |
re1 | re2 |
匹配正则表达式re1或re2 |
foo | bar |
. |
匹配任何字符(换行符除外) |
b.b |
^ |
匹配字符串的开始 |
^Dear |
$ |
匹配字符串的结尾 |
/bin/*sh$ |
* |
匹配前面出现的正则表达式零次或多次 |
[A-Za-z0-9]* |
+ |
匹配前面出现的正则表达式一次或多次 |
[a-z]+\.com |
? |
匹配前面出现的正则表达式零次或一次 |
goo? |
{N} |
匹配前面出现的正则表达式N次 |
[0-9]{3} |
{M,N} |
匹配重复出现M次到N次的正则表达式 |
[0-9]{5,9} |
[…] |
匹配字符组里出现的任意一个字符 |
[aeiou] |
[..x-y..] |
匹配从字符x到y中出现的任意一个字符 |
[0-9], [A-Za-z] |
[^…] |
不匹配此字符集中出现的任何一个字符,包括某一范围的字符(如果在此字符集中出现) |
[^aeiou], [^A-Za-z0-9_] |
(* | + | ? | {} )? |
用于上面出现的任何“非贪婪”。版本重复匹配次数符号 |
.*?[a-z] |
(…) |
匹配封闭括号中正则表达式(RE),并保存为子组 |
([0-9]{3})?, f(oo|u)bar |
特殊字符 |
|
|
\d |
匹配任何数字,和[0-9]一样(\D是\d的反义:任何非数字符) |
data\d.txt |
\w |
匹配任何数字字母字符,和[A-Za-z0-9_]相同(\W是\w的反义) |
[A-Za-z_]\w+ |
\s |
匹配任何空白符,和[\n\t\r\v\f]相同,(\S是\s的反义) |
of\sthe |
\b |
匹配单词边界(\B是\b的反义) |
\bThe\b |
\nn |
匹配已保存的子组(请参考上面的正则表达式符号:(…)) |
price:\16 |
\c |
逐一匹配特殊字符c(即,取消它的特殊含义,按字面匹配) |
\., \\, \* |
\A (\Z) |
匹配字符串的起始(结束) |
\ADear |
2. Python 的re模块:核心函数和方法
模块的函数
compile(pattern, flags=0) |
compile RE pattern with any optional flags and return a regex object |
re模块的函数和regex对象的方法
match(pattern, string, flags=0) |
attempt to match RE pattern to string with optional flags; return match object on success, None on failure |
search(pattern, string, flags=0) |
search for first occurrence of RE pattern within string with optional flags; return match object on success, None on failure |
findall(pattern, string) |
look for all (non-overlapping) occurrences of pattern in string; return a list of matches (new as of Python 1.5.2) |
split(pattern, string, max=0) |
split string into a list according to RE pattern delimiter and return list of successful matches, splitting at most max times (split all occurrences is the default) |
sub(pattern, repl, string, max=0) |
replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided (also see subn() which, in addition, returns the number of substitutions made) |
匹配对象的方法
group(num=0) |
return entire match (or specific subgroup num) |
groups() |
return all matching subgroups in a tuple (empty if there weren't any) |
3. 正则表达式的用法
Tue Nov 7 18:38:48 1995::kmfxps@vixmsfphpvxh.gov::815740728-6-12
Mon Dec 12 15:09:50 1977::goprf@pivcuqfecxxf.edu::250758590-5-12
Thu Apr 13 15:40:25 1972::uqkrtf@dqtnunm.com::71998825-6-7
Mon Jan 31 07:58:42 1994::knqz@cofegju.edu::759974322-4-7
Mon Nov 16 12:30:48 1970::edlprre@omwmoaqhqjb.net::27577848-7-11
Fri Jul 5 01:07:39 1996::uhjwm@gffttoky.edu::836500059-5-8
Python脚本
import re
for line in open('data.log'):
# result = re.findall('.*(h).*(h).*', line)
line = line.rstrip()
# result = re.split('::|\n', line)
# result = re.match('^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)', line)
# result = re.search('.+?(\d+-\d+-\d+)', line)
result = re.search('-(\d)-', line)
# if len(result) != 0:
if result is not None:
# print result
# print result.group()
print result.group(1)
注释中的语句为re模块的一些常见用法