30分钟学会正则表达式

最新推荐文章于 2022-04-28 15:02:18 发布

原创最新推荐文章于 2022-04-28 15:02:18 发布 · 612 阅读

CC 4.0 BY-SA版权

正则表达式就是处理字符串的方法，它是以行为单位进行字符串的处理行为，正则表达式是通过一些特殊字符的辅助，可以让用户轻易达到查找，删除，替换某定字符串的处理程序
正则表达式基本上是一种“表示法”，只要工具程序支持这种表示法，那么该工具程序就可以用来做正则表达式的处理用
注意：正则表达式与通配符是完全不同的东西，通配符是bash操接口的一个功能，正则表达式是处理字符串的一种方法
使用正则表达式时要特别留意当时环境的语系为何，否则可能会发现与别人不同的选取结果

扩展：  #echo $LANG
              zh_CN.UTF_8     简体中文编码
              zh_TW.UTF_8      繁体中文编码
              en_US.UTF_8       英文的字符编码
        
特殊字符
[[:alnum:]] 代表英文大小写字符及数字，即 0-9, A-Z,a-z
[[:alpha:]] 代表任何英文大小写字符，即 A-Z, a-z
[[:space:]] 任何会产生空白的字符，包括空白键, [Tab] 等等[[:digit:]] 代表数字，即 0-9
[[:lower:]] 代表小写字符，即 a-z
[[:upper:]] 代表大写字符，即 A

grep
-A 该行及其后n行
-B 该行及其前n行
–color 将正确选项列出颜色
eg1 用dmesg列出内核信息，再用grep找出含有eth那一行

[root@liu ~]# dmesg | grep 'eth'
[    1.883825] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:70:a3:b8
[    1.883831] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection

承上，将eth的前三行与后两行也一起找出来

[root@liu ~]# dmesg | grep -A2 -B3 'eth'
[    1.752168] sr 2:0:0:0: Attached scsi CD-ROM sr0
[    1.758979] hub 2-2:1.0: USB hub found
[    1.760394] hub 2-2:1.0: 7 ports detected
[    1.883825] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:70:a3:b8
[    1.883831] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
[    2.051199] usb 2-2.1: new full-speed USB device number 4 using uhci_hcd
[    2.079742] SGI XFS with ACLs, security attributes, no debug enabled

从已经写好的文件regular_express.txt 文本操作
eg2 从文本中找出带the的字符串并显示行号

[root@liu ~]# cat regular_express.txt | grep -n 'the'
8:I can't finish the test.^M
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

反向选择

[root@liu ~]# cat regular_express.txt | grep -vn 'the'
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.this dress doesn't fit me.
4:this dress doesn't fit me
5:However, this dress is about $ 3183 dollars.^M
6:GNU is free air not free beer.^M
7:Her hair is very beauty.^M
9:Oh! The soup taste good.^M
10:motorcycle is cheap than car.
11:This window is clear.
13:Oh! My god!
14:The gd software is a library for drafting programs.^M
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am Vbird
反向选择加上-v的参数

获取忽略大小写the的字符串

[root@liu ~]# cat regular_express.txt | grep -i "the"
I can't finish the test.^M
Oh! The soup taste good.^M
the symbol '*' is represented as start.
The gd software is a library for drafting programs.^M
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
google is the best tools for search keyword.

eg3 利用[ ]来查找集合字符
>>3.1 要查找test或taste这两个字符

[root@liu ~]# grep 't[ae]st' regular_express.txt
I can't finish the test.^M
Oh! The soup taste good.^M
this is a test
[   ]里面的内容是或的意思

查找带有’oo’的字符

[root@liu ~]# grep "oo" regular_express.txt
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.this dress doesn't fit me.
Oh! The soup taste good.^M
google is the best tools for search keyword.
goooooogle yes!

不要oo前面带的字符

[root@liu ~]# grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.this dress doesn't fit me.
18:google is the best tools for search keyword.
19:goooooogle yes!
 ^[  ]代表行首
 [^  ] 反向选择
 例如，只列出行首的the
 [root@liu ~]# grep -n '^the' regular_express.txt
12:the symbol '*' is represented as start.

列出开头是小写字符的那一行

[root@liu ~]# cat regular_express.txt | grep -n '^[a-z]'
2:apple is my favorite food.
4:this dress doesn't fit me
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
23:this is a test

列出以小数点结尾的那一行

[root@liu ~]# grep -n '\.$' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

任意一个字符与重复字符
* 代表0或多个字符，但是正则表达式并不是通配符
. 代表一定有一个任意字符
eg3 找出g??d的字符

[root@liu ~]# grep -n 'g..d' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world <Happy> is the same with "glad".
[root@liu ~]#

找出又任意数字的行列


    [root@liu ~]# grep -n '[0-9][0-9]*' regular_express.txt
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.

ge5 限定连续RE字符范围{ }
{}在shell中具有特殊意义，所以要
>>eg5.1找到两个o的字符

[root@liu ~]# grep -n 'o\{2\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.this dress doesn't fit me.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!

 >>eg5.2找到g后面接两到五个o的字符串

[root@liu ~]# grep -n 'o\{2,5\}' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.this dress doesn't fit me.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!

 *在通配符中代表0到多个字符，在正则表达式中是重复0到多个前一个字符