小篇总结

本文深入探讨了正则表达式的使用方法,涵盖了多种编程语言如awk、JavaScript、sed及vi等的正则表达式语法,并详细解释了转义字符、空白字符、字符类以及特殊运算符的功能与应用。

正则表达式

awk 命令,正则表达式放在两个斜杠之间: /regex/
javascript 语言,也是如此。
sed 和 vi 命令,正则表达分隔符是自定义的:sed:sed 's/regex/replace/,s 's,regex,replace,g; vi :%s/reg/rep/,:%s,reg,rep,g

转义字符

C语言:b t n r f v ' " \ a,\ooo \xhh,比如 '\0','\000, \x30'
awk:

\\

    A literal backslash, ‘\’.
\a

    The “alert” character, Ctrl-g, ASCII code 7 (BEL). (This often makes some sort of audible noise.)
\b

    Backspace, Ctrl-h, ASCII code 8 (BS).
\f

    Formfeed, Ctrl-l, ASCII code 12 (FF).
\n

    Newline, Ctrl-j, ASCII code 10 (LF).
\r

    Carriage return, Ctrl-m, ASCII code 13 (CR).
\t

    Horizontal TAB, Ctrl-i, ASCII code 9 (HT).
\v

    Vertical TAB, Ctrl-k, ASCII code 11 (VT).
\nnn

    The octal value nnn, where nnn stands for 1 to 3 digits between ‘0’ and ‘7’. 
    For example, the code for the ASCII ESC (escape) character is ‘\033’.
\xhh…

    The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits 
    (‘0’–‘9’, and either ‘A’–‘F’ or ‘a’–‘f’). A maximum of two digts are allowed after the ‘\x’. 

空白字符

空白字符包括:空格、水平制表符(tab)、换行。
不同的场景下,对空白的定义不大一样,上述是一般情况下的定义。
C、gawk

字符类

POSIX 正则表达式定义了字符类,12个
在这里插入图片描述

[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:upper:] Uppercase alphabetic characters
[:lower:] Lowercase alphabetic characters
[:digit:] Numeric characters
[:xdigit:] Characters that are hexadecimal digits
[:space:] Space characters (these are: space, TAB, newline, carriage return, formfeed and vertical tab)
[:blank:] Space and TAB characters
[:cntrl:] Control characters
[:print:] Printable characters (characters that are not control characters)
[:graph:] Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)
[:punct:] Punctuation characters (characters that are not letters, digits, control characters, or space characters)

特殊运算符

Java、gawk支持
\w
\s 空白:
\d

POSIX 的 ERE 不支持这些东西。

字段分隔符

  1. awk、bash默认用空白字符序列把一行文本分割成各个字段。空白序列做分割,意思是两个连续的空格不会分出一个空白字符串。

Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not delimit an empty field.

  1. C 的strtok
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值