awk学习

最新推荐文章于 2024-11-07 21:45:30 发布

flykobesummer

最新推荐文章于 2024-11-07 21:45:30 发布

阅读量495

点赞数

CC 4.0 BY-SA版权

文章标签： blog 正则表达式 command printing file blogs

本文链接：https://blog.youkuaiyun.com/flykobesummer/article/details/3335933

2008.11.17.补充

语法：

awk的每条rule都是由format和command组成的，两者都可以分别省略，但是不能同时省略。

比如：

awk '/foo/' test.txt # 省略了command，而由于默认的操作是print，所以当有match的行时，

# +就会整行打印出来。

完整的写法是：

awk '/foo/ { print $0 }'

awk '/foo/ { print }' # 由于print的默认是打印整行，所以$0可以省略

----- --------

| |

format command

执行：

在文件的开头加上 #!/bin/awk -f，并赋予x权限，就可以直接执行。习惯上，可以给文件加.awk后缀。

一些常用功能：

Print the length of the longest input line:

 awk '{ if (length($0) > max) max = length($0) } END { print max }' data

Print every line that is longer than 80 characters:
awk 'length($0) > 80' data
The sole rule has a relational expression as its pattern and it has no action--so the default action, printing the record, is used.
Print the length of the longest line in `data':
expand data | awk '{ if (x < length()) x = length() } END { print "maximum line length is " x }'
The input is processed by the expand utility to change tabs into spaces, so the widths compared are actually the right-margin columns.
Print every line that has at least one field:
awk 'NF > 0' data
This is an easy way to delete blank lines from a file (or rather, to create a new file similar to the old file but from which the blank lines have been removed).

Print seven random numbers from 0 to 100, inclusive:

 awk 'BEGIN { for (i = 1; i <= 7; i++) print int(101 articlelist_1442255022_0_1.html blog_55f710ae01009dru.html blog_55f710ae01009hh3.html blog_55f710ae01009hvi.html blog_55f710ae01009iz6.html blog_55f710ae01009j0k.html blog_55f710ae01009j0p.html blog_55f710ae01009j0z.html blog_55f710ae01009j1c.html blog_55f710ae01009j1s.html blog_55f710ae01009jwo.html blog_55f710ae01009l84.html blog_55f710ae01009lgk.html blog_55f710ae01009lny.html blog_55f710ae01009lnz.html blog_55f710ae01009m51.html blog_55f710ae01009o44.html blog_55f710ae01009ous.html blog_55f710ae01009rkj.html blog_55f710ae01009s36.html blog_55f710ae01009tmq.html blog_55f710ae01009wgc.html blog_55f710ae01009wyw.html blog_55f710ae01009xd9.html blog_55f710ae01009xpn.html blog_55f710ae01009zfu.html blog_55f710ae0100a0gr.html blog_55f710ae0100a27z.html blog_55f710ae0100a3lk.html blog_55f710ae0100a5p7.html blog_55f710ae0100a86o.html blog_55f710ae0100a8mo.html blog_55f710ae0100a8u8.html blog_55f710ae0100a9se.html blog_55f710ae0100adas.html blog_55f710ae0100adc1.html blog_55f710ae0100add8.html blog_55f710ae0100aewy.html blog_55f710ae0100aflr.html blog_55f710ae0100an8o.html blog_55f710ae0100anh0.html blog_55f710ae0100anmw.html blog_55f710ae0100anzh.html blog_55f710ae0100apyw.html blog_55f710ae0100at4m.html blog_55f710ae0100ausp.html blog_55f710ae0100avt9.html blog_55f710ae0100ayyz.html blog_55f710ae0100b1cr.html blog_55f710ae0100b1cs.html blog_55f710ae0100b1sq.html blog.html blogs cookiecsdn.txt csdn.html dblog_55f710ae01009dru.html dblog_55f710ae01009hh3.html dblog_55f710ae01009hvi.html dblog_55f710ae01009iz6.html dblog_55f710ae01009j0k.html dblog_55f710ae01009j0p.html dblog_55f710ae01009j0z.html dblog_55f710ae01009j1c.html dblog_55f710ae01009j1s.html dblog_55f710ae01009jwo.html dblog_55f710ae01009l84.html dblog_55f710ae01009l84.html.bak dblog_55f710ae01009lgk.html dblog_55f710ae01009lny.html dblog_55f710ae01009lnz.html dblog_55f710ae01009m51.html dblog_55f710ae01009o44.html dblog_55f710ae01009ous.html dblog_55f710ae01009rkj.html dblog_55f710ae01009s36.html dblog_55f710ae01009tmq.html dblog_55f710ae01009wgc.html dblog_55f710ae01009wyw.html dblog_55f710ae01009xd9.html dblog_55f710ae01009xpn.html dblog_55f710ae01009zfu.html dblog_55f710ae0100a0gr.html dblog_55f710ae0100a27z.html dblog_55f710ae0100a3lk.html dblog_55f710ae0100a5p7.html dblog_55f710ae0100a86o.html dblog_55f710ae0100a8mo.html dblog_55f710ae0100a8u8.html dblog_55f710ae0100a9se.html dblog_55f710ae0100adas.html dblog_55f710ae0100adc1.html getTags.sh parse.sh readme.txt tag.txt test url.txt rand()) }'

Print the total number of bytes used by files:

 ls -l files | awk '{ x += $5 } END { print "total bytes: " x }'

Print the total number of kilobytes used by files:

 ls -l files | awk '{ x += $5 } END { print "total K-bytes: " (x + 1023)/1024 }'

Print a sorted list of the login names of all users:
awk -F: '{ print $1 }' /etc/passwd | sort
Count the lines in a file:
awk 'END { print NR }' data
Print the even-numbered lines in the data file:
awk 'NR % 2 == 0' data

-----------------------------------------------------------------------------

awk是一种强大的文本、字处理工具。

参考下面的转载，进行一些修改和总结。

选取第二个字段比第一个字段长的行： awk '$2>$1' awk_test.txt

示例文件为：

he hhhhhhhh

hhhhhhhh dd

字段（域）和记录的概念：字段默认用空格分隔，记录默认用换行符分隔，清楚了吧？这都可以修改，分别用awk的FS和RS环境变量。

$0变量：它指的是整条记录。如$ awk '{print $0}' test将输出test文件中的所有记录。

变量NR：一个计数器，每处理完一条记录，NR的值就增加1。如$ awk '{print NR,$0}' test将输出test文件中所有记录，并在记录前显示记录号。

$1,$2分别表示第一、第二个字段。注意的是，awk支持一下子用多个字段分隔符。

awk '$1 ~/^root/' test #第一个字段以root开头，由于没有其他动作，所以默认输出

分析这个例子：

cat /etc/passwd | awk -F: ' /TOBECONTENTnbsp;

NF != 7 { /TOBECONTENTnbsp;

printf("line %d dont have 7 fields %s ", NR, $0) /TOBECONTENTnbsp;

} /TOBECONTENTnbsp;

$1 !~ /[A-Za-z0-9]/ { /TOBECONTENTnbsp;

printf("line %d dont have alpha or numberic: %s ", NR, $0) /TOBECONTENTnbsp;

} /TOBECONTENTnbsp;

$2 == "*" { /TOBECONTENTnbsp;

printf("line %d dont have passwd: %s ", NR, $0)}'

使用作为换行，这样可以多行输入。比较之后直接使用花括号括起来的语句，会在if成立情况下执行，这个跟bash里面的[ condition ] && { ..... } 比较像。用到了NF、NR、$0、$1等变量，其中，NF为一条记录里的字段数目，NR为记录号，$0为本条记录。

$ awk '/^(no|so)/' test-----打印所有以模式no或so开头的行。
$ awk '/^[ns]/{print $1}' test-----如果记录以n或s开头，就打印这个字段。
$ awk '$1 ~/[0-9][0-9]$/{print $1}' test-----如果第一个域以两个数字结束就打印这个字段。
$ awk '$1 == 100 || $2 < 50' test-----如果第一个或等于100或者第二个域小于50，则打印该行。
$ awk '$1 != 10' test-----如果第一个域不等于10就打印该行。
$ awk '/test/{print $1 + 10}' test-----如果记录包含正则表达式test，则第一个域加10并打印出来。
$ awk '{print ($1 > 5 ? "ok "$1: "error"$1)}' test-----如果第一个域大于5则打印问号后面的表达式值，否则打印冒号后面的表达式值。

$ awk '/^root/,/^mysql/' test----打印以正则表达式root开头的记录到以正则表达式mysql开头的记录范围内的所有记录。如果找到一个新的正则表达式root开头的记录，则继续打印直到下一个以正则表达式mysql开头的记录为止，或到文件末尾。

awk '{print length}' awk_test.txt # 打印每条记录的长度

awk '{print length（$1）}' awk_test.txt #打印每条记录第一个字段的长度

参考文章： http://man.lupaworld.com/content/manage/ringkee/awk.htm

kao，居然字数限制，不让贴

Table 1. awk的环境变量

变量	描述
$n	当前记录的第n个字段，字段间由FS分隔。
$0	完整的输入记录。
ARGC	命令行参数的数目。
ARGIND	命令行中当前文件的位置(从0开始算)。
ARGV	包含命令行参数的数组。
CONVFMT	数字转换格式(默认值为%.6g)
ENVIRON	环境变量关联数组。
ERRNO	最后一个系统错误的描述。
FIELDWIDTHS	字段宽度列表(用空格键分隔)。
FILENAME	当前文件名。
FNR	同NR，但相对于当前文件。
FS	字段分隔符(默认是任何空格)。
IGNORECASE	如果为真，则进行忽略大小写的匹配。
NF	当前记录中的字段数。
NR	当前记录数。
OFMT	数字的输出格式(默认值是%.6g)。
OFS	输出字段分隔符(默认值是一个空格)。
ORS	输出记录分隔符(默认值是一个换行符)。
RLENGTH	由match函数所匹配的字符串的长度。
RS	记录分隔符(默认是一个换行符)。
RSTART	由match函数所匹配的字符串的第一个位置。
SUBSEP	数组下标分隔符(默认值是TOBECONTENT34)。