awk 一般知识
awk 变量解释
RS The input record separator, by default a newline.
ORS The output record separator, by default a newline.
OFS The output field separator, a space by default.
FS The input field separator, a space by default. See Fields, above.
Fields
As each input record is read, gawk splits the record into fields, using the value of the FS variable as the field separator. If FS is a single character, fields are separated by that character. If FS is the null string, then each individual character becomes a separate field. Otherwise, FS is expected to be a full regular expression. In the special case that FS is a single space, fields are separated by runs of spaces and/or tabs and/or newlines. (But see the section POSIX COMPATIBILITY, below). NOTE: The value of IGNORECASE (see below) also affects how fields are split when FS is a regular expression, and how records are separated when RS is a regular expression.
IGNORECASE
Controls the case-sensitivity of all regular expression and string operations. If IGNORECASE has a non-zero value, then string comparisons and pattern matching in rules, field splitting with FS, record separating with RS, regular expression matching with ~ and !~, and the gensub(), gsub(), index(), match(), split(), and sub() built-in functions all ignore case when doing regular expression operations.
NOTE: Array subscripting is not affected. However, the asort() and asorti() functions are affected.
Thus, if IGNORECASE is not equal to zero, /aB/ matches all of the strings "ab", "aB", "Ab", and "AB". As with all AWK variables, the
initial value of IGNORECASE is zero, so all regular expression and string operations are normally case-sensitive. Under Unix, the full
ISO 8859-1 Latin-1 character set is used when ignoring case. As of gawk 3.1.4, the case equivalencies are fully locale-aware, based on
the C <ctype.h> facilities such as isalpha(), and toupper().
中文:
RS:Record Separator,记录分隔符
ORS:Output Record Separate,输出当前记录分隔符
FS:Field Separator,字段分隔符
OFS:Out of Field Separator,输出字段分隔符
举例:
1 RS和ORS区别
默认RS
[root@hadoop-slave1 shell]# awk ‘BEGIN{RS=”\n”;}{print $0}’ 3.txt
I am a student.
I am a worker.
I am a doctor.
[root@hadoop-slave1 shell]#
RS设置成空格
[root@hadoop-slave1 shell]# awk ‘BEGIN{RS=” “;}{print $0}’ 3.txt
I
am
a
student.
I
am
a
worker.
I
am
a
doctor.
[root@hadoop-slave1 shell]#
ORS设置成—\n
[root@hadoop-slave1 shell]# awk ‘BEGIN{ORS=”—\n”;}{print $0}’ 3.txt
I am a student.—
I am a worker.—
I am a doctor.—
[root@hadoop-slave1 shell]#
注意:
awk 的begin后面的RS=只能用双引号,不能用单引号,
单引号报错:
[root@hadoop-slave1 shell]# awk 'BEGIN{RS='---';}{print $0}'
awk: BEGIN{RS=---;}{print $0}
awk: ^ syntax error
[root@hadoop-slave1 shell]# awk 'BEGIN{ORS="我是输出分隔符\n";}{print $0}' 3.txt
I am a student.我是输出分隔符
I am a worker.我是输出分隔符
I am a doctor.我是输出分隔符
[root@hadoop-slave1 shell]#
BEGIN后面的的大括号的语句,最后面需要加分号
ORS输出分隔符其实默认每行是\n来换行
RS 输入分隔符默认也是\n,所以我们可以自由地换成其他的,比如可以以其他的符号为一条记录的分隔符,不用换行\n
2 OFS 和FS
RS 和ORS互逆过程
OF和FS 互换过程
[root@hadoop-slave1 shell]# echo “abc abc acb”|awk ‘BEGIN{OFS=”—”;}{print 1OFS2}’
abc—abc
[root@hadoop-slave1 shell]#
默认是空格或者?。。。。
[root@hadoop-slave1 shell]# echo “abc abc acb”|awk ‘BEGIN{OFS=”—”;}{NF=NF;print 0}’
abc—abc—acb
[root@hadoop-slave1 shell]# echo “abc abc acb”|awk ‘BEGIN{OFS=”—”;}{print0}’
abc abc acb
[root@hadoop-slave1 shell]#
你会看到,这个结果是不同的,为什么呢?
awk应用实例
- 获取IP地址(多个分隔符、匹配用法): ifconfig eth0|awk -F':| +' '/inet addr/{print $4}'
- 统计网络连接数(数组,匹配,循环): netstat -an | awk '/^tcp/{state[$NF]++}END{for(i in state){print i"\t"state[i]}}'
- 获取JAVA程序匹配内容列:ps aux | grep java| awk -F' +|=' -v reg='^.*base.*' '{for(i=1;i<=NF;i++)if($i~reg){print $(i+1)}}'
awk 函数
int(x) 返回x的整数部分的值
sqrt(x) 返回x的平方根
rand() 返回伪随机数r,其中0<=r<1
srand() 建立rand()新的种子数,如果没有指定就用当天的时间 awk 'BEGIN{srand();print rand()}'
sub(),gsub() 替换函数 echo "hello world world" | awk '{gsub("world","longlong");print $0}'
index(s,t) 返回t在字符串s中的位置,如果没有则返回0
length(s) 返回字符串长度,当没有给出s时,返回$0的长度
match(s,r) 如果正则表达式r在s中匹配到,则返回出现的起始位置,否则返回0
split(s,a,sep) 使用sep将字符串s分解到数组a中。默认sep为FS echo "00-11-22-33-44" |awk '{split($0,a,"-");for(i in a){print i":"a[i]}}'
tolower(s) 将字符串中所有的大写字母转换为小写
toupper(s) 与tolower相反
awk 内置 变量
ARGC 命令行参数个数
ARGV 命令行参数排列
ENVIRON 支持队列中系统环境变量的使用
FILENAME awk浏览的文件名
FNR 浏览文件的记录数
FS 设置输入域分隔符,等价于命令行 -F选项
NF 浏览记录的域的个数
NR 已读的记录数
OFS 输出域分隔符
ORS 输出记录分隔符
RS 控制记录分隔符
条件语句
if (expression) {
statement;
statement;
… …
}if (expression) {
statement;
} else {
statement2;
}if (expression) {
statement1;
} else if (expression1) {
statement2;
} else {
statement3;
}
循环语句
awk中的循环语句同样借鉴于C语言,支持while、do/while、for、break、continue,这些关键字的语义和C语言中的语义完全相同。