awk 最简单基本规则
pattern {action}
pattern {action}
...
也就是说, 符合某种模式,执行某种操作。
模式
- BEGIN { statements } The statements are executed once before any input has been read.
- 在所有其他操作之前先执行这个
- END { statements } The statements are executed once after all input has been read.
- 在所有其他操作之后执行这个
- expression { statements } The statements are executed ‘lt each input line where the expression is true, that is, nonzero or nonnull.
- expression 为非0就执行操作
- /regular expression I { statements } The statements are executed at each input line that contains a string matched by the regular expression.
- 输入符合模式匹配就执行操作
- compound pattern { statements } A compound pattern combines expressions with && (AND), II (OR), I (NOT), and parentheses; the statements are executed at each input line where the compound pattern is true.
- 多个模式通过 && || ! 来组合。
- pattern 1 , pattern 2 { statements } A range pattern matches each input line from a line matched by pattern 1 to the next line matched by pattern 2, inclusive; the statements are executed at each matching line.
- [重要] 区间匹配, 对符合pattern1 和 pattern2 的输入和他们之间的每一个输入执行。
Action
The statements in actions can include:
- expressions, with constants, variables, assignments, function calls, etc. print expression -list
- printf (format, expression -list)
- if (expression) statement
- if (expression) statement else statement
- while (expression) statement
- for (expression ; expression ; expression) statement
- for (variable in array) statement
- do statement while (expression)
- break
- continue
- next
- exit
- exit expression
- { statements }
可以看到 awk的基本语法类似 C + Shell 。 next 表示开始下一轮输入。 算是唯一的对awk工作流程控制的关键字。
awk 编程基本单元
变量
- awk 只有两种类型的变量 : 数字 , 字符串
- awk 的变量可以在字符串和数字之间自动转化。
- 未初始化的awk变量为数字0或者字符串
# 1. Test variable default init
END { printf("string = %s, number = %d\n", aaa, aaa) ;}
输出
string = , number = 0
- awk 普通变量作用域是全局的。 当且仅有变量是函数参数的时候才是局部的。
# 2. Test use function parameter list to implement local variable
function factorial( n ,_ARG_END_ , i , s ) { # _ARG_END_ 表示之后的不是函数参数, 仅仅是为了使用局部变量。
#这个是并不是任何awk定义的宏,仅仅是个习惯,便于阅读。
s = 1;
for(i =1 ; i<= n ; i ++ ) {
s *= i;
}
return s;
}
END{
for(i = 0 ; i < 10 ; i ++ ){
printf("factorial( %d ) is %d \n" , i , factorial(i));
}
}
当前输出
factorial( 0 ) is 1
factorial( 1 ) is 1
factorial( 2 ) is 2
factorial( 3 ) is 6
factorial( 4 ) is 24
factorial( 5 ) is 120
factorial( 6 ) is 720
factorial( 7 ) is 5040
factorial( 8 ) is 40320
factorial( 9 ) is 362880
如果使用function factorial( n ) {
那么输出
factorial( 0 ) is 1
factorial( 2 ) is 2
factorial( 4 ) is 24
factorial( 6 ) is 720
factorial( 8 ) is 40320
- 可以使用 -dump-variables 来在脚本执行结束后将所有的全局变量打印到ump-variables 文件中。 比如上面的脚本生成
ARGC: 2
ARGIND: 1
ARGV: array, 2 elements
BINMODE: 0
CONVFMT: "%.6g"
ENVIRON: array, 106 elements
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "source.txt"
FNR: 22
FPAT: "[^[:space:]]+"
FS: " "
IGNORECASE: 0
LINT: 0
NF: 1
NR: 22
OFMT: "%.6g"
OFS: " "
ORS: "\n"
PREC: 53
PROCINFO: array, 14 elements
RLENGTH: 0
ROUNDMODE: "N"
RS: "\n"
RSTART: 0
RT: "\n"
SUBSEP: "\034"
TEXTDOMAIN: "messages"
i: 10 <-- 唯一的自定义全局变量 。
- BuildIn 变量
VARIABLE | MEANING | DEFAULT |
---|---|---|
ARGC | number of command-line arguments | |
ARGV | array of command-line arguments | |
FILENAME | name of current input file | |
FNR | record number in current file | |
FS | controls the input field separator | ” “ |
NF | number of fields in current record | |
NR | number of records read so far | |
OFMT | output format for numbers | “%.6g” |
OFS | output field separator | “” |
ORS | output record separator | “\n” |
RLENGTH | length of string matched by match function | |
RS | controls the input record separator | “\n” |
RSTART | start of string matched by match function | |
SUBSEP | subscript separator | “\034” |
运算符
- assignment operators = += -= *= I= %= “’=
- 赋值运算符
- conditional expression operator ? :
- ? :
- logical operators I I (OR), && (AND), I (NOT)
- 逻辑运算符
- matching operators ~and I~
- 匹配运算符
- relational operators < <= == I= > >= c
- 关系运算符
- oncatenation (no explicit operator)
- 字符串连接 ( 不需要显示的运算符)
- arithmetic operators + - * / % ^
- 数字运算
- unary +and -
- 正负号
- increment and decrement operators ++and – (prefix and postfix)
- 自增自减
- parentheses for grouping
- 括号
数组
awk 中唯一的数据结构就是数组。
- 数组的键值可以是数字或者字符串
- 数组的value可以使数字或者字符串
- 同一个数组内可以既有数字又有字符串的键值
- 数组的value不可以是数组
- 数组支持多重键值, 其实是多个键值通过 SUBSEP 连接成为一个字符串。
###########################################################################
# Test multikey array
END {
array[1,1,2,4] = 1; array[2] = 3 ;
for( a in array ) {
printf(" The array[%s] = %d \n", a ,array[a]);
o = split(a,x,SUBSER);
printf("Multikeys %d :", o );
for(key in x){
printf(" %s ",x[key]);
}
printf("\n");
}
printf("length of array is %d \n",length(array)) ;
}
for in
循环遍历数组
for( a in array) { print array[a]; }
- 判断数组键值是否存在
if( "test" in array ) { print array["test"] ; }
length 函数读取数组长度
delete 关键字删除数组/数组中数据
# delete 关键字必须对数组试用。变量不支持。
# delete 关键字仅仅删除数据内容, 数组的名字仍然被占用, 不可复用。
END {
delete array ; # 删除整个数组
delete array[0] ; # 删除一项数据
函数
BuildIn 函数
- BUILT-IN ARITHMETIC FUNCTIONS
FUNCTION | VALUE RETURNED |
---|---|
atan2(y,x) | arctangent of y / x in the range -3.1435926 to 3.1415926 |
cos(x) | cosine of x, with x in radians x度的cos |
exp(x) | e 的 x 次方 |
int(x) | integer part of x; truncated towards 0 when x > 0 |
log(x) | natural (base e) logarithm of x |
rand() | random number r, where 0 <= r <= 1 |
sin(x) | sine of x , with x in radians |
sqrt(x) | square root of x |
srand(x) | x is new seed for rand ( ) |
- TABLE 2-7. BUILT-IN STRING FUNCTIONS
FUNCTION | DESCRIPTION |
---|---|
gsub(r,s) | 替换 s for r globally in $0, return number of substitutions made |
gsub(r ,s ,t) | 替换 for r globally in string t, return number of substitutions made |
index(s ,t) | return first position of string t in s, or 0 if t is not present |
length(s) | return number of characters in s |
match(s ,r) | test whether s contains a substring matched by r, return index or 0; sets RSTART and RLENGTH |
split(s ,a) | split s into array a on FS, return number of fields |
split(s ,a ,fs) | splits into array a on field separator fs, return number of fields |
sprintf(fmt , expr -list ) | return expr -list formatted according to format string fmt |
sub(r ,s) | substitutes for the leftmost longest substring of $0 matched by r, return number of substitutions made |
sub(r ,s ,t) | substitute s for the leftmost longest substring oft matched by r, return number of substitutions made |
substr (s ,p) | return suffix of s starting at position p |
substr (s ,p ,n) | return substring of s of length n starting at position p |
自定义函数
function name (parameter-list )
statements
}
- 函数的参数是local的变量
- 函数的返回值通过return 返回
- 函数调用和C一样
system 函数可以调用Shell命令
输出
- print 类似 shell 的 echo
{
print $0 ;
}
- printf 类似C的printf
printf("S3 = %s \n",S3);
- 使用
> 和 >>
符号来进行重定向输出。
printf("S3 = %s \n",S3) > "outfile";
- 使用
|
来利用管道
# | 后面必须是字符串类型变量。 表示接收数据的命令。
print $0 | "sort"