项目五:sed流编辑器与awk文本处理工具
任务一:正则表达式提取文本
1.在用户家目录中,创建测试文件file.txt
[root@redhat04 ~]# vim file.txt
[root@redhat04 ~]# cat file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
1234567890
E-mail: example@example.com
Regular expressions is too sample
2.提取包含字母“o”具其后为任意字符的行,并显示在终端上
[root@redhat04 ~]# grep "o." file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
3提取包含字母“o”的0个、1个或多个实例的行,并显示在终端上
[root@redhat04 ~]# grep "o*" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
1234567890
E-mail: example@example.com
Regular expressions is too sample
4.提取以字母“b”开头、以字母“n”结尾的行,并显示在终端上
[root@redhat04 ~]# grep "b.*n" file.txt
The quick brown fox jumps over the lazy dog.
5.提取以字母“b”开头、以字母“n”结尾的行,并显示在终端上
[root@redhat04 ~]# egrep "u+" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
Regular expressions is too sample
6.提取包含单词“This”且其后有0个或一个字母“s”的行,并显示在终端上
[root@redhat04 ~]# egrep "This?" file.txt
This is a sample text for testing regular expressions.
7.提取以单词“The”开头的行,并显示在终端上
[root@redhat04 ~]# grep "^The" file.txt
The quick brown fox jumps over the lazy dog.
8.提取以“com”结尾的行,并显示在终端上。
[root@redhat04 ~]# grep "com$" file.txt
E-mail: example@example.com
9.提取包含“fox”或“dog”的行,并显示在终端上。
[root@redhat04 ~]# egrep "fox|dog" file.txt
The quick brown fox jumps over the lazy dog.
10.提取包含元音字母的行,并显示在终端上。
[root@redhat04 ~]# grep "[aeiou]" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
11.提取包含点号的行,并显示在终端上。
[root@redhat04 ~]# grep "\." file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
12.提取包含1~3个字母“o”的行,并显示在终端上。
[root@redhat04 ~]# egrep "o{1,3}" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
13.提取包含字母“f”且其后接0个或一个字母“o”,最后是字母“x”的行,并显示在终端上
[root@redhat04 ~]# egrep "fo?x" file.txt
The quick brown fox jumps over the lazy dog.
14.提取包含“fox jumps”或“dog jumps”的行,并显示在终端上。
[root@redhat04 ~]# egrep "(fox|dog) jumps" file.txt
The quick brown fox jumps over the lazy dog.
15.提取包含连续两个字母“o”的行,并显示在终端上。
[root@redhat04 ~]# egrep "o{2}" file.txt
Regular expressions is too sample
16.提取以“dog.”结尾的行,并显示在终端上。
[root@redhat04 ~]# egrep "dog\.$" file.txt
[root@redhat04 ~]#
任务二:sed案例
1.在用户家目录中,创建测试文件file1.txt
[root@redhat04 ~]# vim file1.txt
[root@redhat04 ~]# cat file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
2.将文本中的“Ubuntu”替换为“Ubuntu Linux”,将结果显示在终端上。
[root@redhat04 ~]# sed 's/Ubuntu/Ubuntu Linux/g' file1.txt
Distribution Kernel Version Community
Ubuntu Linux 6.3.4 Ubuntu Linux Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
3.删除包含“Community”的行,将结果显示在终端上。
[root@redhat04 ~]# sed '/Community/d' file1.txt
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
RHEL 5.15.113 Red Hat Inc.
4.提取以“Mint”开头的行,将结果显示在终端上。
[root@redhat04 ~]# sed -n '/^Mint/p' file1.txt
Mint 5.4.243 Mint Community
5.在每行的末尾添加“ - Stable Version”,将结果显示在终端上。
[root@redhat04 ~]# sed 's/$/ - Stable Version/' file1.txt
Distribution Kernel Version Community - Stable Version
Ubuntu 6.3.4 Ubuntu Community - Stable Version
Fedora 6.3.4 Fedora Project - Stable Version
Debian 6.1.30 Debian Project - Stable Version
Arch Linux 6.2.16 Arch Linux Community - Stable Version
Mint 5.4.243 Mint Community - Stable Version
Manjaro 5.10.180 Manjaro Community - Stable Version
openSUSE 5.10.180 openSUSE Community - Stable Version
openEuler 6.2.16 openEuler Community - Stable Version
RHEL 5.15.113 Red Hat Inc. - Stable Version
6.在以“Debian”开头的行前插入一行内容“New Line”,将结果显示在终端上。
[root@redhat04 ~]# sed '/^Debian/i New Line' file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
New Line
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
7.提取每行中匹配模式“5.x.x”的内容,将结果显示在终端上。
[root@redhat04 ~]# sed -n 's/.*\(5\.[0-9]\+\.[0-9]\+\).*$/\1/p' file1.txt
5.4.243
5.10.180
5.10.180
5.15.113
8.将动作指令写入文件,通过sed选项读取指令,将结果显示在终端上。
[root@redhat04 ~]# echo "p" > commands
[root@redhat04 ~]# sed -n -f commands file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
9.删除第1行、第2行和第5行,将结果显示在终端上。
[root@redhat04 ~]# sed -e '1d' -e '2d' -e '5d' file1.txt
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
10.将文本中的所有数字替换为字母“X”,将结果显示在终端上。
[root@redhat04 ~]# sed 's/[0-9]/X/g' file1.txt
Distribution Kernel Version Community
Ubuntu X.X.X Ubuntu Community
Fedora X.X.X Fedora Project
Debian X.X.XX Debian Project
Arch Linux X.X.XX Arch Linux Community
Mint X.X.XXX Mint Community
Manjaro X.XX.XXX Manjaro Community
openSUSE X.XX.XXX openSUSE Community
openEuler X.X.XX openEuler Community
RHEL X.XX.XXX Red Hat Inc.
11.提取每行的第二列内容,将结果显示在终端上。
[root@redhat04 ~]# sed 's/^[^[:blank:]]\+[[:blank:]]\+\([^[:blank:]]\+\).*/\1/' file1.txt
Kernel
6.3.4
6.3.4
6.1.30
Linux
5.4.243
5.10.180
5.10.180
6.2.16
5.15.113
12.在文件中的所有行的开头添加行号,将结果显示在终端上。
[root@redhat04 ~]# sed = file1.txt | sed 'N; s/\n/ /'
1 Distribution Kernel Version Community
2 Ubuntu 6.3.4 Ubuntu Community
3 Fedora 6.3.4 Fedora Project
4 Debian 6.1.30 Debian Project
5 Arch Linux 6.2.16 Arch Linux Community
6 Mint 5.4.243 Mint Community
7 Manjaro 5.10.180 Manjaro Community
8 openSUSE 5.10.180 openSUSE Community
9 openEuler 6.2.16 openEuler Community
10 RHEL 5.15.113 Red Hat Inc.
任务三:awk案例
1.在用户家目录中,创建测试文件awkfile.txt。
[root@redhat04 ~]# cat awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
102,Alice,Smith,28,Data Scientist,Red Hat Inc.,2021-11-20
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
104,Emma,Williams,25,UX Designer,Example Company,2020-09-05
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
2.输出EmployeeID 和 FirstName信息,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '{print $1, $2}' awkfile.txt
EmployeeID FirstName
101 John
102 Alice
103 Bob
104 Emma
105 David
106 Tommy
3.输出年龄大于等于 30 的员工信息,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '$4 >= 30 {print}' awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
4.输出所有职位为 “Project Manager” 的员工信息,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '$5 == "Project Manager" {print}' awkfile.txt
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
5.输出员工平均年龄,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '{sum += $4; count++} END {print "Average Age:", sum/count}' awkfile.txt
Average Age: 25.375
6.输出入职日期在 2022 年之后的员工信息,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '$7 > "2022-01-01" {print}' awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
7.按照部门统计员工数量并打印,将结果显示在终端上
[root@redhat04 ~]# awk -F',' '{dept[$5]++} END {for (d in dept) print "Department:", d, "Employee Count:", dept[d]}' awkfile.txt
Department: Employee Count: 1
Department: HR Manager Employee Count: 2
Department: UX Designer Employee Count: 1
Department: Software Engineer Employee Count: 1
Department: Project Manager Employee Count: 1
Department: Position Employee Count: 1
Department: Data Scientist Employee Count: 1
8.输出每个员工姓名长度,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' ' NR != 1 {len = length($2 $3); print "EmployeeID:", $1, "FullName Length:", len}' awkfile.txt
EmployeeID: EmployeeID FullName Length: 17
EmployeeID: 101 FullName Length: 7
EmployeeID: 102 FullName Length: 10
EmployeeID: 103 FullName Length: 10
EmployeeID: 104 FullName Length: 12
EmployeeID: 105 FullName Length: 13
EmployeeID: 106 FullName Length: 9
9.使用正则表达式“[,.]”作为字段分隔符,提取第1个字段和第4个字段,并输出,将结果显示在终端上
[root@redhat04 ~]# awk -F'[,.]' '{print $1, $4}' awkfile.txt
EmployeeID Age
101 30
102 28
103 35
104 25
105 40
106 45
10.使用内置变量NF输出每行的最后一个字段内容,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' '{print "Last Field:" $NF}' awkfile.txt
Last Field:
Last Field:StartDate
Last Field:2022-01-15
Last Field:2021-11-20
Last Field:2022-03-10
Last Field:2020-09-05
Last Field:2023-02-28
Last Field:2019-02-28
11.使用内置变量NR输出每行的行号和内容,将结果显示在终端上。
[root@redhat04 ~]# awk '{ print "Line:", NR, "Content:", $0 }' awkfile.txt
Line: 1 Content:
Line: 2 Content: EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
Line: 3 Content: 101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
Line: 4 Content: 102,Alice,Smith,28,Data Scientist,Red Hat Inc.,2021-11-20
Line: 5 Content: 103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
Line: 6 Content: 104,Emma,Williams,25,UX Designer,Example Company,2020-09-05
Line: 7 Content: 105,David,Anderson,40,HR Manager,Another Company,2023-02-28
Line: 8 Content: 106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
12.使用内置函数split将字段内容按逗号分隔成数组,并输出数组元素。
awk -F ',' '{ split($0, arr, ","); for (i in arr) print "Element:", arr[i] }' awkfile.txt
Element: EmployeeID
Element: FirstName
Element: LastName
Element: Age
Element: Position
Element: Company
Element: StartDate
Element: 101
Element: John
Element: Doe
Element: 30
Element: Software Engineer
Element: Huawei Technologies Co.Ltd.
Element: 2022-01-15
Element: 102
Element: Alice
Element: Smith
Element: 28
Element: Data Scientist
Element: Red Hat Inc.
Element: 2021-11-20
Element: 103
Element: Bob
Element: Johnson
Element: 35
Element: Project Manager
Element: CentOS Project
Element: 2022-03-10
Element: 104
Element: Emma
Element: Williams
Element: 25
Element: UX Designer
Element: Example Company
Element: 2020-09-05
Element: 105
Element: David
Element: Anderson
Element: 40
Element: HR Manager
Element: Another Company
Element: 2023-02-28
Element: 106
Element: Tommy
Element: Alex
Element: 45
Element: HR Manager
Element: Other Company
Element: 2019-02-28
13.使用内置函数substr提取版本号中的主要版本部分,将结果显示在终端上。
[root@redhat04 ~]# awk -F',' 'NR > 1 {print $1, "FirstName (3 chars):", substr($2, 1, 3)}' awkfile.txt
EmployeeID FirstName (3 chars): Fir
101 FirstName (3 chars): Joh
102 FirstName (3 chars): Ali
103 FirstName (3 chars): Bob
104 FirstName (3 chars): Emm
105 FirstName (3 chars): Dav
106 FirstName (3 chars): Tom
14.使用内置函数tolower将发行版名称转换为小写形式,将结果显示在终端上。
[root@redhat04 ~]# awk -F ',' '{ print "Lowercase Distribution:", tolower($2) }' awkfile.txt
Lowercase Distribution:
Lowercase Distribution: firstname
Lowercase Distribution: john
Lowercase Distribution: alice
Lowercase Distribution: bob
Lowercase Distribution: emma
Lowercase Distribution: david
Lowercase Distribution: tommy
15.创建awk脚本,以逗号作为分隔符,输出员工ID,并计算每个员工的工作年限。
[root@redhat04 ~]# vim define-func.awk
[root@redhat04 ~]# cat define-func.awk
BEGIN {
FS = ","
print "EmployeeID,WorkYears" # 输出标题行
}
NR > 1 {
# 将StartDate字段拆分为年、月、日
split($7, start_date, "-")
start_timestamp = mktime(start_date[1] " " start_date[2] " " start_date[3] " 0 0 0")
current_timestamp = systime()
work_years = int((current_timestamp - start_timestamp) / (365 * 24 * 3600))
print $1, work_years
}
# 执行awk脚本
[root@redhat04 ~]# awk -f define-func.awk awkfile.txt
EmployeeID,WorkYears
EmployeeID 54
101 2
102 2
103 2
104 4
105 1
106 5