Unix文本处理工具之sed

和上文提到的awk一样,sed也是Unix的文本处理工具。sed是Stream Editor(流式编辑器)的缩写,它能够基于模式匹配过滤(所谓过滤就是在文件中找到符合某些条件的行)修改文本(就是对找到的符合条件的内容进行一些修改操作)。

1、sed命令格式

1.1 sed命令的基本格式

sed命令主要有三种使用形式:

  • sed ‘编辑指令’ 文件1 文件2 ……:用于将处理后的结果输出
  • sed -n ‘编辑指令’ 文件1 文件2 ……:用于只输出编辑指令影响的行
  • sed -i ‘编辑指令’ 文件1 文件2 ……:用于直接在文本文件上修改文本内容(在物理磁盘上修改文件)

1.2 编辑指令

编辑指令主要由两部分组成:前面是逗号隔开的两个地址(或者没有逗号,只有一个地址),代表要处理文本的起始位置到结束位置;后面是要进行的操作类型。格式如下:

[起始地址[,结束地址]]操作类型

如果在一条sed命令中要用到多条编辑指令,那么各个编辑指令之间要用;隔开,也可以将多条编辑指令放在多个单引号中,但是这样的话,要在每个单引号的前面加一个-e。下面是一个简单的例子:

$cat sed_test.txt
1 apple a,b,d,f
2 boy alsdjf,apple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139

$sed -n '2,5p' sed_test.txt 
2 boy alsdjf,apple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139

$sed -n '5p' sed_test.txt 
5 eat http://blog.youkuaiyun.com/xia7139

$sed -n -e '2p' -e'5p' sed_test.txt 
2 boy alsdjf,apple,kdjf
5 eat http://blog.youkuaiyun.com/xia7139

1.3 操作类型

sed常用的操作类型如下:

操作作用
p打印文本行(print)
n取下一行(next)
d删除(delete)
s字符串替换(substitude)
a追加新的文本(append)

2、例子

2.1 插入行的操作

方法一:
可以使用i命令来插入一行。例如,假设要在文件的第2行之前插入一行内容:

sed '2i\This is a new line.' file.txt

方法二:
也可以使用a命令来在指定行后插入一行。例如,要在第3行之后插入一行内容:

sed '3a\This is a new line.' file.txt

方法三:
还可以使用s命令结合正则表达式来实现插入一行的功能。例如,假设要在以"apple"开头的行之前插入一行内容:

sed '/^apple/i\This is a new line.' file.txt

macos上需要在“\”后面换行。

2.2 删除、替换和筛选

下面的例子都是对文提到的文件sed_test.txt的操作:

使用正则表达式:
(1)输出从第一个包含kdjf的行到最后一行($代表最后一行)
$sed -n '/kdjf/,$p' sed_test.txt
2 boy alsdjf,appleapple,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139
(2)输出包含单词apple的行
(shell中单词是指一个字符串前后有空格或其它标点符号。正则表达式中用一个<>来界定一个单词,在sed中用该符号要进行转义。)
$sed -n '/\<apple\>/p' sed_test.txt
1 apple a,b,d,f

删除指定行(这里没有-i,不对原文件进行操作,只是将处理后的结果输出。):
(1)删除第2到4行
$sed '2,4d' sed_test.txt
1 apple a,b,d,f
5 eat http://blog.youkuaiyun.com/xia7139
(2)删除包含appleapple的行和最后一行($)
$sed '/appleapple/d;$d' sed_test.txt
1 apple a,b,d,f
3 cat 163.2.201.1
4 dog www.google.com
(3)删除不包含(!表示反选,选中不符合条件的行)apple的行(这样就只剩下了包含apple的行了)
$sed '/apple/!d' sed_test.txt
1 apple a,b,d,f
2 boy alsdjf,appleapple,kdjf

替换指定文本:
(1)将1-4行的apple换成AMAZON。s代表替换,g代表如果一行出现两个apple则全部替换。
$sed '1,4s/apple/AMAZON/g' sed_test.txt
1 AMAZON a,b,d,f
2 boy alsdjf,AMAZONAMAZON,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139
(2)注释shell脚本(在行首插入#)
$sed '1,3s/^/#/g' sed_test.txt
#1 apple a,b,d,f
#2 boy alsdjf,appleapple,kdjf
#3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139
(3)删除字符串apple(如果不写起始地址和结束地址,则默认为所有行。)
$sed 's/apple//g' sed_test.txt
1  a,b,d,f
2 boy alsdjf,,kdjf
3 cat 163.2.201.1
4 dog www.google.com
5 eat http://blog.youkuaiyun.com/xia7139

以上两篇文章介绍了Unix两个文本处理工具awk和sed,希望对大家有所帮助。

3、sed和正则表达式

利用正则表达式结合sed能极大地帮助我们处理文本。比如下面的例子:

例1:正则表达式初步使用。
$ cat poem.txt 
  The choice
            By William Butler Yeats
  The intellect of man is forced to choose
  Perfection of life ,or of the work,
  And if take the second must refuse
  A heavenly mansion ,raging in the dark.
  When all that story 's finished ,what's the news?
  In luck or out the toil has left its mark:
  That old perplexity an empty purse,
  Or the day's vanity ,the night's remorse.
(1)用命令删掉文本行首的空格。
$ sed 's/^\s*//g' poem.txt 
The choice
By William Butler Yeats
The intellect of man is forced to choose
Perfection of life ,or of the work,
And if take the second must refuse
A heavenly mansion ,raging in the dark.
When all that story 's finished ,what's the news?
In luck or out the toil has left its mark:
That old perplexity an empty purse,
Or the day's vanity ,the night's remorse.
也可以这样(注意,这里+是要被转义的,而上面的*不用转义。):
$ sed 's/^\s\+//g' poem.txt 
The choice
By William Butler Yeats
The intellect of man is forced to choose
Perfection of life ,or of the work,
And if take the second must refuse
A heavenly mansion ,raging in the dark.
When all that story 's finished ,what's the news?
In luck or out the toil has left its mark:
That old perplexity an empty purse,
Or the day's vanity ,the night's remorse.

(2)删掉文中所有的空格
$ sed 's/\s*//g' poem.txt 
Thechoice
ByWilliamButlerYeats
Theintellectofmanisforcedtochoose
Perfectionoflife,orofthework,
Andiftakethesecondmustrefuse
Aheavenlymansion,raginginthedark.
Whenallthatstory'sfinished,what'sthenews?
Inluckoroutthetoilhasleftitsmark:
Thatoldperplexityanemptypurse,
Ortheday'svanity,thenight'sremorse.
如下也可以达到同样的效果:
$ sed 's/\s\+//g' poem.txt 
Thechoice
ByWilliamButlerYeats
Theintellectofmanisforcedtochoose
Perfectionoflife,orofthework,
Andiftakethesecondmustrefuse
Aheavenlymansion,raginginthedark.
Whenallthatstory'sfinished,what'sthenews?
Inluckoroutthetoilhasleftitsmark:
Thatoldperplexityanemptypurse,
Ortheday'svanity,thenight'sremorse.

也可以使用如下命令实现相关功能:
$ sed 's/^[[:space:]]*//g' poem.txt(删除行开头的空格)
$ sed 's/^[ ]*//g' poem.txt(删除行开头的空格)
$ sed 's/^ *//g' poem.txt(删除行开头的空格)
$ sed 's/^[[:space:]]*//g' poem.txt(删除行开头的空格)
$ sed '/^$/d' poem.txt(删除空行)
$ sed '/^[ ]*$/d' poem.txt(删除空行和只有空格的行)

4、初步体会sed的威力

4.1 去掉不想要的标签

比如你有一个文件内容如下:

test.txt:
{'books/daglib/0015113': '<title>Scale-isometric polytopal graphs in hypercubes and Z<sub>n</sub>.</title>\n',
 'books/daglib/0097705': '<title>Discrete total l<sub>p</sub>-norm approximation problem for the function.</title>\n',
 'books/daglib/p/AveneauCFM11': '<title>A Framework for <i>n</i>-Dimensional Visibility Computations.</title>\n',
 'books/daglib/p/Carter11': '<title>Using <i>Dungeons and Dragons</i> to Integrate Curricula in Classroom.</title>\n',
 'books/daglib/p/CasolaLRV11': '<title>Access Control in Cloud-on-Grid Systems: The <i>PerfCloud</i> Case Study.</title>\n',
 'books/daglib/p/ChunKZDMZ11': '<title>Reverse Engineer of Gene Networks with Application <i>in silico</i> Network.</title>\n',
 'books/daglib/p/ChungK11': '<title>eQTL Mapping for Functional Classes of <i>Saccharomyces cerevisiae</i> Genes wssion.</title>\n',
 'books/daglib/p/Goldman11': '<title>A Model for Computer Graphics Based on Algebra for \xe2\x84\x9d<sup>3</sup>.</title>\n',
 'books/daglib/p/LiZ11': '<title>Line Geometry over \xe2\x84\x9d<sup>3, 3</sup>, and Stewart Platforms.</title>\n',
 'books/daglib/p/Liestol11': '<title><i>Situated Simulations</i> Between Reality and Designing a Narrative Space.</title>\n'}

现在你要将其中的各行中的类似于标签之类的东西去掉,只需用一条sed命令:

$sed -e 's/<title>//g;s/<\/title>//g' -e 's/<i>//g;s/<\/i>//g' -e 's/<sub>//g;s/<\/sub>//g' -e 's/<sup>//g;s/<\/sup>//g' test.txt 
{'books/daglib/0015113': 'Scale-isometric polytopal graphs in hypercubes and Zn.\n',
 'books/daglib/0097705': 'Discrete total lp-norm approximation problem for the function.\n',
 'books/daglib/p/AveneauCFM11': 'A Framework for n-Dimensional Visibility Computations.\n',
 'books/daglib/p/Carter11': 'Using Dungeons and Dragons to Integrate Curricula in Classroom.\n',
 'books/daglib/p/CasolaLRV11': 'Access Control in Cloud-on-Grid Systems: The PerfCloud Case Study.\n',
 'books/daglib/p/ChunKZDMZ11': 'Reverse Engineer of Gene Networks with Application in silico Network.\n',
 'books/daglib/p/ChungK11': 'eQTL Mapping for Functional Classes of Saccharomyces cerevisiae Genes wssion.\n',
 'books/daglib/p/Goldman11': 'A Model for Computer Graphics Based on Algebra for \xe2\x84\x9d3.\n',
 'books/daglib/p/LiZ11': 'Line Geometry over \xe2\x84\x9d3, 3, and Stewart Platforms.\n',
 'books/daglib/p/Liestol11': 'Situated Simulations Between Reality and Designing a Narrative Space.\n'}

如果要在原文件中修改,只需加-i参数。

4.2 多文件中的替换

temp目录下有t1.txt、t2.txt和t3.txt等三个文件,其内容分别如下:

$ cat t1.txt 
StevenJobs mac
i like
mill fuck
$ cat t2.txt 
you  i
 life 
 love 
 lol
 StevenJobs 
 sex
 friend
 one night stand
 for one night
$ cat t3.txt 
good night
nigtmare
fuck
StevenJobs StevenJobs
StevenJobsStevenJobs
mac

下面想要将上面三个文件中的“StevenJobs”换成“Apple”,一一替换的话太麻烦,这里用一个稍微简单的方法,一条命令完成所有文件的替换。

预备知识

(1)反引号

反引号括起来的字符串被shell解释为命令行,在执行时,shell首先执行该命令行,并以它的标准输出结果取代整个反引号(包括两个反引号)部分。这样,可以实现用一个命令的执行输出作为另一条命令参数的结果。在bash shell中,$()也会有相同的效果,下面是一个例子。

$ echo `ls`
t1.txt t2.txt t3.txt
$ echo $(ls)
t1.txt t2.txt t3.txt

(2)grep命令的r和l选项

grep -r: Read all files under each directory, recursively。

也是就是说,会递归逐层向下查找目录和目录子目录下的文件,如果没有-r选项grep只会查找当前目录下的文件,不会查找子目录。

grep -l: Suppress normal output; instead print the name of each input file from  which  output  would  normally  have  been  printed.   The scanning will stop on the first match.

这个选项的man手册读起来稍显晦涩,简单地说,-l选项只会打印出包含我们要查找内容的文件名称,下面是grep的例子。

$ grep -r "Steven" .
./t1.txt:StevenJobs mac
./t2.txt: StevenJobs 
./t3.txt:StevenJobs StevenJobs
./t3.txt:StevenJobsStevenJobs
$ grep "Steven" .
$ grep -r "Steven" .
./t1.txt:StevenJobs mac
./t2.txt: StevenJobs 
./t3.txt:StevenJobs StevenJobs
./t3.txt:StevenJobsStevenJobs
//下面的命令和上面的命令效果相同
$ grep  "Steven" *
t1.txt:StevenJobs mac
t2.txt: StevenJobs 
t3.txt:StevenJobs StevenJobs
t3.txt:StevenJobsStevenJobs
$ grep -rl "Steven" .
./t1.txt
./t2.txt
./t3.txt

用上上面的命令,不难得出下面的解决办法,效果如下:

$ sed -i 's/StevenJobs/apple/g' `grep -rl StevenJobs .`
$ cat t*
apple mac
i like
mill fuck
you  i
 life 
 love 
 lol
 apple 
 sex
 friend
 one night stand
 for one night
good night
nigtmare
fuck
apple apple
appleapple
mac

>.< Over!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值