Python 学习,请参考这个网站:
https://pythonprogramming.net/
很多专题,每个都有视频,我觉得讲得不错。
Python 中的 正则表达式 (Regular Expression)的模块是: re
正则表达式中常见的各种rule:
Identifiers:
- \d = any number
- \D = anything but a number
- \s = space
- \S = anything but a space
- \w = any letter
- \W = anything but a letter
- . = any character, except for a new line
- \b = space around whole words
- \. = period. must use backslash, because . normally means any character.
Modifiers:
- {1,3} = for digits, u expect 1-3 counts of digits, or "places"
- + = match 1 or more
- ? = match 0 or 1 repetitions.
- * = match 0 or MORE repetitions
- $ = matches at the end of string
- ^ = matches start of a string
- | = matches either/or. Example x|y = will match either x or y
- [] = range, or "variance"
- {x} = expect to see this amount of the preceding code.
- {x,y} = expect to see this x-y amounts of the precedng code
White Space Charts:
- \n = new line
- \s = space
- \t = tab
- \e = escape
- \f = form feed
- \r = carriage return
Characters to REMEMBER TO ESCAPE IF USED!
- . + * ? [ ] $ ^ ( ) { } | \
Brackets:
- [] = quant[ia]tative = will find either quantitative, or quantatative.
- [a-z] = return any lowercase letter a-z
- [1-5a-qA-Z] = return all numbers 1-5, lowercase letters a-q and uppercase A-Z
举例说明:
import re
exampleString = '''
Jessica is 15 years old, and Daniel is 27 years old.
Edward is 97 years old, and his grandfather, Oscar, is 102.
'''
ages = re.findall(r'\d{1,3}', exampleString)
names = re.findall(r'[A-Z][a-z]*', exampleString)
print(ages)
#print is:['15', '27', '97', '102']
print(names)
#print is:['Jessica', 'Daniel', 'Edward', 'Oscar']
ageDict={}
x=0
for eachName in names:
ageDict[eachNmae] = ages[x]
x+=1
print(ageDict)
#print is: {'Jessica': '15', 'Oscar': '102', 'Edward': '97', 'Daniel': '27'}
上面的例子中只用到了re.findall () 这一个函数,re模块还有很多其他的函数。
re.findall() 返回的是 列表。
举例2:
用到了re.sub() 函数。
re.sub() 用来实现通过正则表达式,实现比普通字符串的replace更加强大的替换功能;
如果输入字符串是:
|
inputStr
=
"hello 123 world 456"
|
而你是想把123和456,都换成222
就需要借助于re.sub,通过正则表达式,来实现这种相对复杂的字符串的替换:
|
replacedStr
=
re.sub(
"\d+"
,
"222"
, inputStr)
|
当然,实际情况中,会有比这个例子更加复杂的,其他各种特殊情况,就只能通过此re.sub去实现如此复杂的替换的功能了。
所以,re.sub的功能就是:
对于输入的一个字符串,利用正则表达式(的强大的字符串处理功能),去实现(相对复杂的)字符串替换处理,然后返回被替换后的字符串
其中re.sub还支持各种参数,比如count指定要替换的个数等等。