正则表达式
awk 命令,正则表达式放在两个斜杠之间: /regex/
javascript 语言,也是如此。
sed 和 vi 命令,正则表达分隔符是自定义的:sed:sed 's/regex/replace/,s 's,regex,replace,g; vi :%s/reg/rep/,:%s,reg,rep,g
转义字符
C语言:b t n r f v ' " \ a,\ooo \xhh,比如 '\0','\000, \x30'
awk:
\\
A literal backslash, ‘\’.
\a
The “alert” character, Ctrl-g, ASCII code 7 (BEL). (This often makes some sort of audible noise.)
\b
Backspace, Ctrl-h, ASCII code 8 (BS).
\f
Formfeed, Ctrl-l, ASCII code 12 (FF).
\n
Newline, Ctrl-j, ASCII code 10 (LF).
\r
Carriage return, Ctrl-m, ASCII code 13 (CR).
\t
Horizontal TAB, Ctrl-i, ASCII code 9 (HT).
\v
Vertical TAB, Ctrl-k, ASCII code 11 (VT).
\nnn
The octal value nnn, where nnn stands for 1 to 3 digits between ‘0’ and ‘7’.
For example, the code for the ASCII ESC (escape) character is ‘\033’.
\xhh…
The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits
(‘0’–‘9’, and either ‘A’–‘F’ or ‘a’–‘f’). A maximum of two digts are allowed after the ‘\x’.
空白字符
空白字符包括:空格、水平制表符(tab)、换行。
不同的场景下,对空白的定义不大一样,上述是一般情况下的定义。
C、gawk
字符类
POSIX 正则表达式定义了字符类,12个

[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:upper:] Uppercase alphabetic characters
[:lower:] Lowercase alphabetic characters
[:digit:] Numeric characters
[:xdigit:] Characters that are hexadecimal digits
[:space:] Space characters (these are: space, TAB, newline, carriage return, formfeed and vertical tab)
[:blank:] Space and TAB characters
[:cntrl:] Control characters
[:print:] Printable characters (characters that are not control characters)
[:graph:] Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)
[:punct:] Punctuation characters (characters that are not letters, digits, control characters, or space characters)
特殊运算符
Java、gawk支持
\w
\s 空白:
\d
POSIX 的 ERE 不支持这些东西。
字段分隔符
- awk、bash默认用空白字符序列把一行文本分割成各个字段。
空白序列做分割,意思是两个连续的空格不会分出一个空白字符串。
Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not delimit an empty field.
- C 的strtok
本文深入探讨了正则表达式的使用方法,涵盖了多种编程语言如awk、JavaScript、sed及vi等的正则表达式语法,并详细解释了转义字符、空白字符、字符类以及特殊运算符的功能与应用。
1万+

被折叠的 条评论
为什么被折叠?



