46、sed 和 gawk 实用指南

最新推荐文章于 2025-09-10 14:41:09 发布

算法笑匠

最新推荐文章于 2025-09-10 14:41:09 发布

阅读量34

点赞数

CC 4.0 BY-SA版权

分类专栏：掌握Linux命令与脚本文章标签： sed gawk 文本处理

本文链接：https://blog.youkuaiyun.com/1a2s3d4f5g/article/details/151349796

掌握Linux命令与脚本专栏收录该内容

63 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

sed 和 gawk 实用指南

1. sed 编辑基础

1.1 插入和追加文本

sed 编辑器允许向数据流中插入和追加文本行，但这两个操作容易混淆：
- 插入（i）命令：在指定行之前添加新行。
- 追加（a）命令：在指定行之后添加新行。

这两个命令的格式有时需要在单独的行指定要插入或追加的行，格式如下：

sed '[address]command\
new line'

示例：

$ echo "Test Line 2" | sed 'i\Test Line 1'
Test Line 1
Test Line 2
$ echo "Test Line 2" | sed 'a\Test Line 1'
Test Line 2
Test Line 1

若要在数据流内部插入或追加数据，需使用寻址来指定位置，且只能指定单个行地址，可以是数字行号或文本模式，但不能使用地址范围。

示例：

$ cat data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
$ sed '3i\
> This is an inserted line.
> ' data6.txt
This is line number 1.
This is line number 2.
This is an inserted line.
This is the 3rd line.
This is the 4th line.
$ sed '3a\
> This is an appended line.
> ' data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is an appended line.
This is the 4th line.

若要追加新行到数据流末尾，可使用美元符号（$）表示最后一行；若要在数据流开头添加新行，可在第 1 行之前插入。

若要插入或追加多行文本，除最后一行外，每行新文本末尾都需使用反斜杠。

示例：

$ sed '1i\
> This is an inserted line.\
> This is another inserted line.
> ' data6.txt
This is an inserted line.
This is another inserted line.
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.

1.2 修改行

更改（c）命令可更改数据流中整行文本的内容，使用方式与插入和追加命令类似，需单独指定新行。

示例：

$ sed '2c\
> This is a changed line of text.
> ' data6.txt
This is line number 1.
This is a changed line of text.
This is the 3rd line.
This is the 4th line.
$ sed '/3rd line/c\
> This is a changed line of text.
> ' data6.txt
This is line number 1.
This is line number 2.
This is a changed line of text.
This is the 4th line.

更改命令可使用地址范围，但结果可能与预期不同，它会用单行文本替换指定范围内的所有行。

示例：

$ sed '2,3c\
> This is a changed line of text.
> ' data6.txt
This is line number 1.
This is a changed line of text.
This is the 4th line.

1.3 转换字符

转换（y）命令是 sed 编辑器中唯一对单个字符进行操作的命令，格式为：

[address]y/inchars/outchars/

该命令对 inchars 和 outchars 进行一对一映射，将 inchars 中的字符依次转换为 outchars 中对应位置的字符。若 inchars 和 outchars 长度不同，sed 编辑器会报错。

示例：

$ cat data9.txt
This is line 1.
This is line 2.
This is line 3.
This is line 4.
This is line 5.
This is line 1 again.
This is line 3 again.
This is the last file line.
$ sed 'y/123/789/' data9.txt
This is line 7.
This is line 8.
This is line 9.
This is line 4.
This is line 5.
This is line 7 again.
This is line 9 again.
This is the last file line.

转换（y）命令是全局命令，会自动对文本行中匹配的所有字符进行转换，无法限制转换到字符的特定出现位置。

示例：

$ echo "Test #1 of try #1." | sed 'y/123/678/'
Test #6 of try #6.

1.4 打印命令

除了在替换（s）命令中使用 p 标志显示 sed 编辑器更改的行外，还有三个命令可用于从数据流中打印信息：
- 打印（p）命令：打印文本行。
- 等号（=）命令：打印行号。
- 列表（l）命令：列出一行。

1.4.1 打印行

打印（p）命令单独使用时会重复打印数据流中的行，最常见的用途是打印包含匹配文本模式的行。

示例：

$ echo "This is a test." | sed 'p'
This is a test.
This is a test.
$ cat data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
$ sed -n '/3rd line/p' data6.txt
This is the 3rd line.

也可用于快速打印数据流中的子集行：

$ sed -n '2,3p' data6.txt
This is line number 2.
This is the 3rd line.

还可在修改行之前查看该行，例如结合替换（s）或更改（c）命令：

$ sed -n '/3/{ 
> p
> s/line/test/p
> }' data6.txt
This is the 3rd line.
This is the 3rd test.

1.4.2 打印行号

等号（=）命令打印数据流中当前行的行号，行号由数据流中的换行符确定。

示例：

$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
$ sed '=' data1.txt
1
The quick brown fox jumps over the lazy dog.
2
The quick brown fox jumps over the lazy dog.
3
The quick brown fox jumps over the lazy dog.
4
The quick brown fox jumps over the lazy dog.

结合 -n 选项，可同时显示包含匹配文本模式的行号和文本：

$ cat data7.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
This is line number 1 again; we want to keep it.
This is more text we want to keep.
Last line in the file; we want to keep it.
$ sed -n '/text/{
> =
> p
> }' data7.txt
6
This is more text we want to keep.

1.4.3 列出行

列表（l）命令可打印数据流中的文本和不可打印字符，不可打印字符使用八进制值（前面加反斜杠）或标准 C 风格命名法显示。

示例：

$ cat data10.txt
This    line    contains        tabs.
This line does contain tabs.
$ sed -n 'l' data10.txt
This\tline\tcontains\ttabs.$
This line does contain tabs.$

若数据流中包含转义字符，列表（l）命令会显示该字符：

$ cat data11.txt
This line contains an escape character.
$ sed -n 'l' data11.txt
This line contains an escape character. \a$

1.5 使用文件

1.5.1 写入文件

写入（w）命令用于将行写入文件，格式为：

[address]w filename

filename 可以是相对或绝对路径名，但运行 sed 编辑器的人必须对该文件有写入权限。地址可以是任何 sed 中使用的寻址方法，如单个行号、文本模式或行号范围、文本模式范围。

示例：

$ sed '1,2w test.txt' data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
$ cat test.txt
This is line number 1.
This is line number 2.

若不想在标准输出显示行，可使用 -n 选项。该命令可根据公共文本值从主文件创建数据文件。

示例：

$ cat data12.txt
Blum, R       Browncoat
McGuiness, A  Alliance
Bresnahan, C  Browncoat
Harken, C     Alliance
$ sed -n '/Browncoat/w Browncoats.txt' data12.txt
$ cat Browncoats.txt
Blum, R       Browncoat
Bresnahan, C  Browncoat

1.5.2 从文件读取数据

读取（r）命令允许插入单独文件中的数据，格式为：

[address]r filename

filename 参数指定包含数据的文件的绝对或相对路径名，读取（r）命令不能使用地址范围，只能指定单个行号或文本模式地址，sed 编辑器会在指定地址后插入文件中的文本。

示例：

$ cat data13.txt
This is an added line.
This is a second added line.
$ sed '3r data13.txt' data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is an added line.
This is a second added line.
This is the 4th line.

若要在数据流末尾添加文本，可使用美元符号地址符号：

$ sed '$r data13.txt' data6.txt
This is line number 1.
This is line number 2.
This is the 3rd line.
This is the 4th line.
This is an added line.
This is a second added line.

读取（r）命令的一个很酷的应用是结合删除（d）命令，用另一个文件中的数据替换文件中的占位符。

示例：

$ cat notice.std
Would the following people:
LIST
please report to the ship's captain.
$ sed '/LIST/{
> r data12.txt
> d
> }' notice.std
Would the following people:
Blum, R       Browncoat
McGuiness, A  Alliance
Bresnahan, C  Browncoat
Harken, C     Alliance
please report to the ship's captain.

1.6 操作流程总结

下面是一个 mermaid 流程图，展示了 sed 常见操作的基本流程：

graph TD;
    A[开始] --> B{选择操作类型};
    B --> |插入/追加文本| C[指定地址和命令];
    B --> |更改行| D[指定地址和新内容];
    B --> |转换字符| E[指定地址和字符映射];
    B --> |打印信息| F{选择打印命令};
    F --> |打印行| G[指定地址或模式];
    F --> |打印行号| H[指定地址或模式];
    F --> |列出行| I[无额外参数];
    B --> |写入文件| J[指定地址和文件名];
    B --> |读取文件| K[指定地址和文件名];
    C --> L[执行操作];
    D --> L;
    E --> L;
    G --> L;
    H --> L;
    I --> L;
    J --> L;
    K --> L;
    L --> M[结束];

1.7 命令总结表格

命令	功能	示例
i	在指定行之前插入新行	`sed '3i\This is an inserted line.' data.txt`
a	在指定行之后追加新行	`sed '3a\This is an appended line.' data.txt`
c	更改指定行的内容	`sed '2c\This is a changed line.' data.txt`
y	转换字符	`sed 'y/123/789/' data.txt`
p	打印行	`sed -n '/pattern/p' data.txt`
=	打印行号	`sed -n '/pattern/{=;p}' data.txt`
l	列出包含不可打印字符的行	`sed -n 'l' data.txt`
w	将行写入文件	`sed '1,2w test.txt' data.txt`
r	从文件读取数据并插入	`sed '3r data2.txt' data.txt`

2. sed 和 gawk 实用案例

2.1 实际场景介绍

在某些 Linux 系统中，shell 脚本的 shebang 部分可能会影响脚本的运行环境。传统上，Unix 系统的 shell 脚本 shebang 通常为 #!/bin/sh ，在过去大多数 Linux 发行版中， /bin/sh 会链接到 Bash shell（ /bin/bash ），所以使用 #!/bin/sh 作为 shebang 不会有问题。

然而，在一些 Linux 发行版（如 Ubuntu）中， /bin/sh 现在链接到了不同的文件（如 Dash shell）。如果一个 shell 脚本使用 #!/bin/sh 作为 shebang 并在这些系统上运行，脚本将在 Dash shell 中执行，这可能导致许多 shell 脚本命令失败。

假设有一家公司，他们的服务器使用 RHEL 系统，其 Bash shell 脚本使用旧的 #!/bin/sh shebang。现在公司要引入运行 Ubuntu 的服务器，为了让脚本在新服务器上正常运行，需要将这些脚本的 shebang 转换为 #!/bin/bash 。

2.2 解决步骤

2.2.1 查找包含旧 shebang 的脚本

首先，使用 sed 来创建一个包含旧 shebang 的脚本列表。可以使用替换（s）命令结合只处理脚本第一行的寻址方式：

$ sed '1s!/bin/sh!/bin/bash!' OldScripts/testAscript.sh
#!/bin/bash
[...]
echo "This is Test Script #1."
[...]
#
exit

这个命令对 testAscript.sh 脚本进行了替换，表明该脚本确实使用了 #!/bin/sh 作为 shebang。但我们需要检查目录中的所有文件，并且不想看到脚本的内容，因此需要对命令进行修改。使用 -s 选项告诉 sed 将目录中的每个文件视为单独的数据流，使用 -n 选项抑制输出：

$ sed -sn '1s!/bin/sh!/bin/bash!' OldScripts/*.sh

这个命令虽然执行成功，但我们还需要看到脚本文件的名称，以便知道哪些脚本使用了旧的 shebang。这时可以使用 sed 的 F 命令，该命令即使在使用 -n 选项时也能打印当前处理的数据文件的名称。在命令前加上 1 表示只打印一次文件名：

$ sed -sn '1F;
> 1s!/bin/sh!/bin/bash!' OldScripts/*.sh
OldScripts/backgroundoutput.sh
OldScripts/backgroundscript.sh
[...]
OldScripts/tryat.sh

2.2.2 使用 gawk 美化报告

为了让报告更易读，将 sed 的输出重定向到 gawk 进行处理：

$ sed -sn '1F;
> 1s!/bin/sh!/bin/bash!' OldScripts/*.sh |
> gawk 'BEGIN {print ""
> print "The following scripts have /bin/sh as their shebang:"
> print ""}
> {print $0}
> END {print "End of Report"}'

运行上述命令后，会输出一个包含脚本文件名的报告，清楚地列出了哪些脚本使用了旧的 shebang。

2.2.3 修改脚本的 shebang

在确认需要更新这些脚本的 shebang 后，可以使用 sed 和 for 循环来完成修改工作。首先创建一个新的目录来存储修改后的脚本：

$ mkdir TestScripts

然后使用以下脚本进行修改：

$ for filename in $(grep -l "/bin/sh" OldScripts/*.sh)
> do
> newFilename=$(basename $filename)
> cat $filename |
> sed '1c\#!/bin/bash' > TestScripts/$newFilename
> done

最后，验证修改后的脚本是否使用了新的 shebang：

$ grep "/bin/bash" TestScripts/*.sh
TestScripts/backgroundoutput.sh:#!/bin/bash
TestScripts/backgroundscript.sh:#!/bin/bash
[...]
TestScripts/tryat.sh:#!/bin/bash

2.3 完整脚本

以下是一个完整的脚本，用于完成上述查找和修改脚本 shebang 的任务：

$ cat ChangeScriptShell.sh
#!/bin/bash
# Change the shebang used for a directory of scripts
#
################## Function Declarations ##########################
#
function errorOrExit {
        echo
        echo $message1
        echo $message2
        echo "Exiting script..."
        exit
}
#
function modifyScripts {
        echo
        read -p "Directory name in which to store new scripts? " newScriptDir
        #
        echo "Modifying the scripts started at $(date +%N) nanoseconds"
        #
        count=0
        for filename in $(grep -l "/bin/sh" $scriptDir/*.sh)
        do
                newFilename=$(basename $filename)
                cat $filename |
                sed '1c\#!/bin/bash' > $newScriptDir/$newFilename
                count=$[$count + 1]
        done
        echo "$count modifications completed at $(date +%N) nanoseconds"
}
#
################# Check for Script Directory ######################
if [ -z $1 ]
then
        message1="The name of the directory containing scripts to check"
        message2="is missing. Please provide the name as a parameter."
        errorOrExit
else
        scriptDir=$1
fi
#
################ Create Shebang Report ############################
#
sed -sn '1F;
1s!/bin/sh!/bin/bash!' $scriptDir/*.sh |
gawk 'BEGIN {print ""
print "The following scripts have /bin/sh as their shebang:"
print "==================================================="}
{print $0}
END {print ""
print "End of Report"}'
#
################## Change Scripts? #################################
#
#
echo
read -p "Do you wish to modify these scripts' shebang? (Y/n)? " answer
#
case $answer in
Y | y)
        modifyScripts
        ;;
N | n)
        message1="No scripts will be modified."
        message2="Run this script later to modify, if desired."
        errorOrExit
        ;;
*)
        message1="Did not answer Y or n."
        message2="No scripts will be modified."
        errorOrExit
        ;;
esac

2.4 脚本运行示例

以下是脚本的运行示例：

$ mkdir NewScripts
$ ./ChangeScriptShell.sh OldScripts

The following scripts have /bin/sh as their shebang:
===================================================
OldScripts/backgroundoutput.sh
OldScripts/backgroundscript.sh
[...]
OldScripts/tryat.sh

End of Report

Do you wish to modify these scripts' shebang? (Y/n)? Y

Directory name in which to store new scripts? NewScripts
Modifying the scripts started at 168687219 nanoseconds
18 modifications completed at 266043476 nanoseconds

2.5 脚本改进建议

这个脚本虽然能够完成任务，但还有一些可以改进的地方：
- 目录检查 ：确保修改后的 Bash shell 脚本的新目录位置存在。
- 文件覆盖检查 ：检查要保存的文件是否已经存在于目录中，避免意外覆盖文件。
- 参数传递和报告保存 ：允许将新目录位置作为参数传递，并可能保存 sed 和 gawk 生成的报告。

2.6 操作流程总结

下面是一个 mermaid 流程图，展示了使用 sed 和 gawk 解决脚本 shebang 问题的操作流程：

graph TD;
    A[开始] --> B{输入脚本目录};
    B --> |目录存在| C[生成包含旧 shebang 的脚本报告];
    B --> |目录不存在| D[输出错误信息并退出];
    C --> E{是否修改脚本};
    E --> |是| F[输入新脚本存储目录];
    E --> |否| G[输出不修改信息并退出];
    F --> H[修改脚本 shebang 并保存到新目录];
    H --> I[输出修改完成信息];
    D --> J[结束];
    G --> J;
    I --> J;

2.7 总结

通过上述内容，我们了解了 sed 编辑器的各种命令和操作，包括插入、追加、修改行，转换字符，打印信息以及与文件的交互等。同时，通过一个实际案例展示了如何结合 sed 和 gawk 解决 shell 脚本 shebang 兼容性问题。在实际应用中，可以根据具体需求灵活运用这些工具和方法，提高工作效率。希望这些内容能帮助你更好地掌握 sed 和 gawk 的使用。