linux sed&awk简述

最新推荐文章于 2024-11-15 10:26:41 发布

朝闻道-夕死可矣

最新推荐文章于 2024-11-15 10:26:41 发布

阅读量597

点赞数

CC 4.0 BY-SA版权

分类专栏： shell

本文链接：https://blog.youkuaiyun.com/jc_benben/article/details/50403262

shell 专栏收录该内容

17 篇文章

订阅专栏

本文介绍了sed和awk这两种强大的文本处理工具的基本用法及高级应用技巧。sed适用于处理行记录，而awk则擅长数据分析和报告生成。文中通过具体示例展示了如何使用sed进行文本过滤和转换，以及如何利用awk进行复杂的文本分析。

sed&&awk
一，sed适合处理行记录，其他的用awk，但也不总是这样
NAME
       sed - stream editor for filtering and transforming text

SYNOPSIS
       sed [OPTION]... {script-only-if-no-other-script} [input-file]...

DESCRIPTION
       Sed is a stream editor. A stream editor is used to perform basic text transformations on an input
       stream (a file or input from a pipeline). While in some ways similar to an editor which permits
       scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently
       more efficient. But it is sed's ability to filter text in a pipeline which particularly distinguishes
       it from other types of editors.
example；
Some exotic examples:
* Centering lines::
* Increment a number::
* Rename files to lower case::
* Print bash environment::
* Reverse chars of lines::
比如删除开头的空格 sed 's/^ *//' file
打印奇数行：

sed -n 'p;n' file

插入两个空行：
sed 'G;G' file

常用选项：

-n, --quiet, --silent 取消自动打印模式空间

-i[SUFFIX], --in-place[=SUFFIX] 直接修改文件内容
                 edit files in place (makes backup if extension supplied).
                 The default operation mode is to break symbolic and hard links.
                 This can be changed with --follow-symlinks and --copy.

常用命令：

a ∶新增， a 后面可以接字串,这些字串会在新的一行出现
c ∶取代， c 后面可以接字串,这些字串可以取代 n1,n2 之间的行！
d ∶删除
i ∶插入， i 的后面可以接字串,而这些字串会在新的一行出现
p ∶列印，通常 p 会与参数 sed -n 一起使用
s ∶取代，通常这个 s 的动作可以搭配正规表示法，强大之处

-- 其他一些常用命令举例
-- 删除
sed '1d' 1.txt -- 删除第一行
sed '$d' 1.txt -- 删除最后一行
sed '1,$d' 1.txt
-- 删除匹配行
sed '/aa/d' 1.txt -- 删除匹配的aa行

-- 查询
sed -n '1p' 1.txt -- 查询第一行
sed -n '1,$p' 1.txt -- 查询第一行到最后以行

-- 匹配查询
sed -n '/aaa/p' 1.txt -- 查询包括aaa的行
sed -n '/\#/p' 1.txt -- 查询包含#的行，#需要经过转义

-- 增加行
sed '1a aaa' 1.txt -- 在第一行后添加aaa
sed '1a aa\nbb' 1.txt -- 在第一行后添加aa换行添加bb
sed '1,$a aaa' 1.txt

-- 替换行
sed '1c bbb' 1.txt -- 替换第一行为bbb

-- 替换字符串
sed -n '/bbb/p' 1.txt | sed -i 's/bbb/ggg/g'
sed '/bbb/s/bbb/ggg/g' 1.txt

-- 以上命令可以添加-i选项，如果想直接修改文件内容的话，比如：
sed -i '1d' 1.txt -- 删除文件第一行

二，AWK:

awk是一个强大的文本分析工具，相对于grep的查找，sed的编辑，awk在其对数据分析并生成报告时，显得尤为强大。简单来说awk就是把文件逐行的读入，以空格为默认分隔符将每行切片，切开的部分再进行各种分析处理

NAME
       gawk - pattern scanning and processing language

SYNOPSIS
       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       pgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...
       pgawk [ POSIX or GNU style options ] [ -- ] program-text file ...

       dgawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...

DESCRIPTION
       Gawk is the GNU Project's implementation of the AWK programming language. It conforms to the defini‐
       tion of the language in the POSIX 1003.1 Standard. This version in turn is based on the description
       in The AWK Programming Language, by Aho, Kernighan, and Weinberger. Gawk provides the additional fea‐
       tures found in the current version of UNIX awk and a number of GNU-specific extensions.
比如：
打印一个文件的1，3列
awk '{print $1"\t"$3;print "\n"} file
打印第二列之后的列
awk '{for(i=2;i<=NF;i++)print $i" ";printf "\n"}' file
打印奇数行：

awk实现：awk '{if (NR%2==1) print $0}' file

-- 其他

-- 显示制定列帐户，其中-F 表示列的分隔符
cat /etc/passwd |awk -F ':' '{print $1}'
-- awk工作流程是这样的：读入有'\n'换行符分割的一条记录，然后将记录按指定的域分隔符划分域，
-- $0则表示所有域,$1表示第一个域,$n表示第n个域。默认域分隔符是"空白键" 或 "[tab]键"

-- 头尾添加自定义行
cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "noname,/bin/nosh"}'
-- awk工作流程：先执行BEGING，然后读取文件，读入有/n换行符分割的一条记录，然后将记录按指定的域分隔符划分域，填充域，
-- $0则表示所有域,$1表示第一个域,$n表示第n个域,随后开始执行模式所对应的动作action。接着开始读入第二条记录，
-- 直到所有的记录都读完，最后执行END操作

-- 打印匹配的行或者匹配行的指定列
awk -F: '/root/' /etc/passwd
awk -F: '/root/{print $7}' /etc/passwd

awk内置变量：
ARGC               命令行参数个数
ARGV               命令行参数排列
ENVIRON            支持队列中系统环境变量的使用
FILENAME           awk浏览的文件名
FNR                浏览文件的记录数
FS                 设置输入域分隔符，等价于命令行 -F选项
NF                 浏览记录的域的个数 The number of fields in the current input record
NR                 已读的记录数 The total number of input records seen so far
OFS                输出域分隔符
ORS                输出记录分隔符
RS                 控制记录分隔符
比如：
awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd
其他：
awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd
-- 中间有多个action使用分号隔开

-- 统计文件夹下文件大小
ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'

变量也可以使用这种模式定义：

[oracle@prod-db ~]$ cat 1.txt
aaa 111 china
bbb 222 usa
ccc 333 japan
[oracle@prod-db ~]$ awk -va=1 '{print $1,$2+a}' 1.txt
aaa 112
bbb 223
ccc 334

-- 统计某个文件夹下的文件占用的字节数,过滤4096大小的文件(一般都是文件夹):
ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}'

-- awk也支持数组功能这里使用for循环遍历数组
awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd