01-shell文本处理三剑客之grep-优快云博客

1 grep是什么意思？

grep: Global search REgular expression and Print out the line.
文本搜索工具，根据用户指定的“模式（pattern）”对目标文本进行过滤，显示被模式匹配到的行。
嘿嘿，我觉得学习grep，倒不如说是在学习模式匹配，也就是说正则表达式。
我们先来简单实验一下grep的用法：

[root@hadoop1 hadoop]# cat /etc/passwd
root❌0:0:root:/root:/bin/bash
bin❌1:1:bin:/bin:/sbin/nologin
daemon❌2:2:daemon:/sbin:/sbin/nologin
adm❌3:4:adm:/var/adm:/sbin/nologin
lp❌4:7:lp:/var/spool/lpd:/sbin/nologin
sync❌5:0:sync:/sbin:/bin/sync
shutdown❌6:0:shutdown:/sbin:/sbin/shutdown
halt❌7:0:halt:/sbin:/sbin/halt
mail❌8:12:mail:/var/spool/mail:/sbin/nologin
uucp❌10:14:uucp:/var/spool/uucp:/sbin/nologin
operator❌11:0:operator:/root:/sbin/nologin
games❌12?games:/usr/games:/sbin/nologin
gopher❌13:30:gopher:/var/gopher:/sbin/nologin
ftp❌14:50:FTP User:/var/ftp:/sbin/nologin
nobody❌99:99:Nobody:/:/sbin/nologin
dbus❌81:81:System message bus:/:/sbin/nologin
usbmuxd❌113:113:usbmuxd user:/:/sbin/nologin
vcsa❌69:69:virtual console memory owner:/dev:/sbin/nologin
rtkit❌499:497:RealtimeKit:/proc:/sbin/nologin
avahi-autoipd❌170:170:Avahi IPv4LL Stack:/var/lib/avahi-autoipd:/sbin/nologin
abrt❌173:173::/etc/abrt:/sbin/nologin
haldaemon❌68:68:HAL daemon:/:/sbin/nologin
gdm❌42:42::/var/lib/gdm:/sbin/nologin
ntp❌38:38::/etc/ntp:/sbin/nologin
apache❌48:48:Apache:/var/www:/sbin/nologin
saslauth❌498:76:“Saslauthd user”:/var/empty/saslauth:/sbin/nologin
postfix❌89:89::/var/spool/postfix:/sbin/nologin
pulse❌497:496:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
sshd❌74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump❌72:72:?:/sbin/nologin
itcast01❌500:500:itcast01:/home/itcast01:/bin/bash
hadoop❌501:501::/home/hadoop:/bin/bash
mysql❌27:27:MySQL Server:/var/lib/mysql:/bin/bash
[root@hadoop1 hadoop]# cat /etc/passwd |grep root
root❌0:0:root:/root:/bin/bash
operator❌11:0:operator:/root:/sbin/nologin
[root@hadoop1 hadoop]# cat /etc/passwd |grep --color root
root❌0:0:root:/root:/bin/bash
operator❌11:0:operator:/root:/sbin/nologin
[root@hadoop1 hadoop]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
2 正则表达式
grep虽简单，但是模式匹配不简单呀，接下来学习的都是正则表达式。学正则表达式有什么好处？其实我们做大数据的话，要进行数据清洗，或者爬虫等等，都要用正则表达式，用处还是大大的。
正则表达式：由一类字符书写的模式，其中有些字符不表示字符的字面意义，而是表示控制或通配的功能；
同一个元字符所表达的含义可以不一样，依此分为两类：基础正则表达式和扩展正则表达式。一定要明确你所写的是属于基础正则表达式还是扩展正则表达式。
写模式匹配一定要加单引号’ ‘,比如grep –color ‘r…t’ /etc/passwd

[root@hadoop1 hadoop]# grep --color ‘r…t’ /etc/passwd
root❌0:0:root:/root:/bin/bash
operator❌11:0:operator:/root:/sbin/nologin
ftp❌14:50:FTP User:/var/ftp:/sbin/nologin
[root@hadoop1 hadoop]#
1
2
3
4
5
2.1 字符匹配
.：匹配任意单个字符
[]：匹配指定集合中的任意单个字符
[[:digit:]], [0-9]
[[:lower:]], [a-z]
[[:upper:]], [A-Z]
[[:alpha:]], [a-zA-Z]
[[:alnum:]], [0-9a-zA-Z]
[[:space:]]
[[:punct:]]

[root@hadoop1 shelltest]# grep --color ‘abcdef[[:digit:]][[:digit:]][0-9]’ test
abcdef123
[root@hadoop1 shelltest]# grep --color ‘abcdef[[:digit:]]’ test
abcdef123
[root@hadoop1 shelltest]# grep --color ‘abc’ test
abcdef123
[root@hadoop1 shelltest]# grep -o --color ‘abc’ test
abc
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
[^]：匹配指定集合外的任意单个字符

[root@hadoop1 shelltest]# grep --color ‘abcdef[¹]’ test
[root@hadoop1 shelltest]#
1
2
[root@hadoop1 shelltest]# grep --color ‘xielaoshi[^a-z]’ test
xielaoshi121314
xiexiexielaoshi133
[root@hadoop1 shelltest]#
1
2
3
4
2.2匹配次数：
用于对其前面紧邻的字符所能够出现的次数作出限定。语法如下：
: 匹配其前面的字符任意次，0,1或多次；
例如：grep ‘xy’
xy, xxy, xxxy, y
?：匹配其前面的字符0次或1次；
例如：grep ‘x?y’
xy, xxy, y, xxxxxy, aby
+: 匹配其前面的字符出现至少1次；
{m}: 匹配其前面的字符m次；
例如：grep ‘x{2}y’
xy, xxy, y, xxxxxy, aby
{m,n}：匹配其前面的字符至少m次，至多n次；
例如: grep ‘x{2,4}y’
xy, xxy, y, xxxxxxy, aby
grep ‘x{0,4}y’
xy, xxy, y, xxxxxxxxxy, aby
grep ‘x{2,}y’
xy, xxy, y, xxxxxy
.*: 匹配任意长度的任意字符

实验一下：

[root@hadoop1 shelltest]# vi test
[root@hadoop1 shelltest]# cat test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
abcdef123

xy
xxy
xxxxy
y
aby
[root@hadoop1 shelltest]# grep --color ‘x*y’ test
xy
xxy
xxxxy
y
aby
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@hadoop1 shelltest]# grep --color ‘x+y’ test
xy
xxy
xxxxy
[root@hadoop1 shelltest]# grep --color ‘x{2}y’ test
xxy
xxxxy
[root@hadoop1 shelltest]# grep --color ‘x{0,4}y’ test
xy
xxy
xxxxy
y
aby
[root@hadoop1 shelltest]# grep --color ‘x{2,}y’ test
xxy
xxxxy
[root@hadoop1 shelltest]# grep --color ‘x.*i’ test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
如果你跟着敲命令，会不会感觉“\”不知道啥意思？其实它就是转义字符。

2.3 位置锚定：

^: 行首锚定
写在模式的最左侧
$KaTeX parse error: Expected group after '^' at position 25: \dots 写在模式的最右侧 ^̲$ : 空白行
<: 词首锚定, \b
出现在要查找的单词模式的左侧；<char
>：词尾锚定, \b
出现在要查找的单词模式的右侧；char>
<pattern>: 匹配单词
1
2
3
4
5
6
7
8
9
10
[root@hadoop1 shelltest]# grep --color ‘<r’ /etc/passwd
root❌0:0:root:/root:/bin/bash
operator❌11:0:operator:/root:/sbin/nologin
rtkit❌499:497:RealtimeKit:/proc:/sbin/nologin
pulse❌497:496:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
[root@hadoop1 shelltest]# grep --color ‘<ha’ /etc/passwd
halt❌7:0:halt:/sbin:/sbin/halt
haldaemon❌68:68:HAL daemon:/:/sbin/nologin
hadoop❌501:501::/home/hadoop:/bin/bash
[root@hadoop1 shelltest]# grep --color ‘tor>’ /etc/passwd
operator❌11:0:operator:/root:/sbin/nologin
[root@hadoop1 shelltest]# grep --color ‘<root>’ /etc/passwd
root❌0:0:root:/root:/bin/bash
operator❌11:0:operator:/root:/sbin/nologin
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2.4 分组
()

后向引用：模式中，如果使用()实现了分组，在某行文本的检查中，如果()的模式匹配到了某内容，此内容后面的模式中可以被引用；
\1, \2, \3
模式自左而右，引用第#个左括号以及与其匹配右括号之间的模式匹配到的内容；
1
2
3
[root@hadoop1 shelltest]# cat test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
abcdef123

xy
xxy
xxxxy
y
aby
abababy
by
bby
[root@hadoop1 shelltest
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@hadoop1 shelltest]# grep --color ‘ab*y’ test
aby
abababy
[root@hadoop1 shelltest]# grep --color ‘ab{1}y’ test
aby
abababy
[root@hadoop1 shelltest]# vi test
[root@hadoop1 shelltest]# grep --color ‘ab{1,}’ test
abcdef123
aby
abbbby
abababy
[root@hadoop1 shelltest]# grep --color ‘(ab){1,}y’ test
aby
abababy
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
后向引用：

[root@hadoop1 shelltest]# grep --color ‘(ab){1,}y\1’ test
abababyab
1
2
3 grep选项
-v: 反向选取
-o: 仅显示匹配到内容
-i: 忽略字符大小写
-E: 使用扩展正则表达式
-A #: 显示匹配字符的下面的行数内容
-B #：显示匹配字符的下面的行数内容
-C #：显示匹配字符的上下面的行数内容

[root@hadoop1 shelltest]# grep -A 2 ‘abababyab’ test
abababyab
by
bby
[root@hadoop1 shelltest]# grep -B 2 ‘abababyab’ test
aby
abbbby
abababyab
[root@hadoop1 shelltest]# grep -C 2 ‘abababyab’ test
aby
abbbby
abababyab
by
bby
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
以上三个命令在查找日志的时候很有用。

4 egrep及扩展的正则表达式
扩展正则表达式的元字符：
字符匹配：
.
[]
[^]
匹配次数限定：
*
?: 匹配其前面字符0次或1次；
+：匹配其前面的字符至少1次；
{m}：匹配其前面的字符m次；
{m,n}：{m,}, {0,n}
锚定：
^
$
<, >: \b
分组：
()

        支持后向引用：\1, \2, ...
    或者：
        a|b: a或者b
        ab|cd：

# grep -E 'pattern' file...
# egrep 'pattern' file...

1
2
3
4
5
6
7
其实扩展正则表达式相比较于基础正则表达式，扩展正则表达式少了‘\’斜杠。

[root@hadoop1 shelltest]# grep --color ‘xie[1x]’ test
xiexiexielaoshi133
xie123laoshi
[root@hadoop1 shelltest]# egrep --color ‘xie1|x’ test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
xy
xxy
xxxxy
[root@hadoop1 shelltest]# grep --color ‘xie1|x’ test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
xy
xxy
xxxxy
[root@hadoop1 shelltest]# grep -E --color ‘xie1|x’ test
xielaoshi121314
xiexiexielaoshi133
xie123laoshi
xielaoshi
xy
xxy
xxxxy
[root@hadoop1 shelltest]# grep --color ‘xie(1|x)’ test
xiexiexielaoshi133
xie123laoshi
[root@hadoop1 shelltest]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

————————————————
版权声明：本文为优快云博主「自我再教育」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.youkuaiyun.com/qq_29622761/article/details/51601740

:digit: ↩︎