import re
s = 'Hello from csev@umich.edu to cwen@iupui.edu about the meeting @2PM'
lst = re.findall('\S+@\S+', s)
print lst
要注意,\S+是指有至少一个的非空字符。 上面的输出结果是['csev@umich.edu', 'cwen@iupui.edu']
将搜索与抽取结合:
import re
target='//home//moon//Desktop//mbox-short.txt'
mes=open(target)
for line in mes:
line=line.rstrip()
if re.search('^X\S*: +',line):
print line
运行这个程序,经过过滤的数据仅保留如下内容:
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6178
X-DSPAM-Probability: 0.0000
正则表达式括号
import re
target='//home//moon//Desktop//mbox-short.txt'
mes=open(target)
for line in mes:
line=line.rstrip()
x=re.findall('^X\S*: ([0-9.]+)',line)
if len(x)>0:
print x
程序运行结果如下:
[‘0.8475’]
[‘0.0000’]
[‘0.6178’]
[‘0.0000’]
[‘0.6961’]
[‘0.0000’]