awk 编程

awk 最简单基本规则

pattern {action}
pattern {action}
...

也就是说, 符合某种模式,执行某种操作。

模式

  1. BEGIN { statements } The statements are executed once before any input has been read.
    • 在所有其他操作之前先执行这个
  2. END { statements } The statements are executed once after all input has been read.
    • 在所有其他操作之后执行这个
  3. expression { statements } The statements are executed ‘lt each input line where the expression is true, that is, nonzero or nonnull.
    • expression 为非0就执行操作
  4. /regular expression I { statements } The statements are executed at each input line that contains a string matched by the regular expression.
    • 输入符合模式匹配就执行操作
  5. compound pattern { statements } A compound pattern combines expressions with && (AND), II (OR), I (NOT), and parentheses; the statements are executed at each input line where the compound pattern is true.
    • 多个模式通过 && || ! 来组合。
  6. pattern 1 , pattern 2 { statements } A range pattern matches each input line from a line matched by pattern 1 to the next line matched by pattern 2, inclusive; the statements are executed at each matching line.
    • [重要] 区间匹配, 对符合pattern1 和 pattern2 的输入和他们之间的每一个输入执行。

Action

The statements in actions can include:

  • expressions, with constants, variables, assignments, function calls, etc. print expression -list
  • printf (format, expression -list)
  • if (expression) statement
  • if (expression) statement else statement
  • while (expression) statement
  • for (expression ; expression ; expression) statement
  • for (variable in array) statement
  • do statement while (expression)
  • break
  • continue
  • next
  • exit
  • exit expression
  • { statements }
    可以看到 awk的基本语法类似 C + Shell 。 next 表示开始下一轮输入。 算是唯一的对awk工作流程控制的关键字。

awk 编程基本单元

变量

  • awk 只有两种类型的变量 : 数字 , 字符串
  • awk 的变量可以在字符串和数字之间自动转化。
  • 未初始化的awk变量为数字0或者字符串
# 1. Test variable default init
END { printf("string = %s, number = %d\n", aaa, aaa) ;}

输出

string = , number = 0
  • awk 普通变量作用域是全局的。 当且仅有变量是函数参数的时候才是局部的。
# 2. Test use function parameter list to implement local variable
function  factorial( n ,_ARG_END_ , i , s ) { # _ARG_END_ 表示之后的不是函数参数,  仅仅是为了使用局部变量。
                                              #这个是并不是任何awk定义的宏,仅仅是个习惯,便于阅读。
    s = 1;
    for(i =1 ; i<= n ; i ++ ) {
        s *= i;
    }
    return s;
}


END{
    for(i = 0 ; i < 10 ; i ++ ){
        printf("factorial( %d ) is %d \n" , i  , factorial(i));
    }
}

当前输出

factorial( 0 ) is 1 
factorial( 1 ) is 1 
factorial( 2 ) is 2 
factorial( 3 ) is 6 
factorial( 4 ) is 24 
factorial( 5 ) is 120 
factorial( 6 ) is 720 
factorial( 7 ) is 5040 
factorial( 8 ) is 40320 
factorial( 9 ) is 362880 

如果使用function factorial( n ) { 那么输出

factorial( 0 ) is 1 
factorial( 2 ) is 2 
factorial( 4 ) is 24 
factorial( 6 ) is 720 
factorial( 8 ) is 40320 
  • 可以使用 -dump-variables 来在脚本执行结束后将所有的全局变量打印到ump-variables 文件中。 比如上面的脚本生成
ARGC: 2
ARGIND: 1
ARGV: array, 2 elements
BINMODE: 0
CONVFMT: "%.6g"
ENVIRON: array, 106 elements
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "source.txt"
FNR: 22
FPAT: "[^[:space:]]+"
FS: " "
IGNORECASE: 0
LINT: 0
NF: 1
NR: 22
OFMT: "%.6g"
OFS: " "
ORS: "\n"
PREC: 53
PROCINFO: array, 14 elements
RLENGTH: 0
ROUNDMODE: "N"
RS: "\n"
RSTART: 0
RT: "\n"
SUBSEP: "\034"
TEXTDOMAIN: "messages"
i: 10   <-- 唯一的自定义全局变量 。 
  • BuildIn 变量
VARIABLEMEANINGDEFAULT
ARGCnumber of command-line arguments
ARGVarray of command-line arguments
FILENAMEname of current input file
FNRrecord number in current file
FScontrols the input field separator” “
NFnumber of fields in current record
NRnumber of records read so far
OFMToutput format for numbers“%.6g”
OFSoutput field separator“”
ORSoutput record separator“\n”
RLENGTHlength of string matched by match function
RScontrols the input record separator“\n”
RSTARTstart of string matched by match function
SUBSEPsubscript separator“\034”

运算符

  • assignment operators = += -= *= I= %= “’=
    • 赋值运算符
  • conditional expression operator ? :
    • ? :
  • logical operators I I (OR), && (AND), I (NOT)
    • 逻辑运算符
  • matching operators ~and I~
    • 匹配运算符
  • relational operators < <= == I= > >= c
    • 关系运算符
  • oncatenation (no explicit operator)
    • 字符串连接 ( 不需要显示的运算符)
  • arithmetic operators + - * / % ^
    • 数字运算
  • unary +and -
    • 正负号
  • increment and decrement operators ++and – (prefix and postfix)
    • 自增自减
  • parentheses for grouping
    • 括号

数组

awk 中唯一的数据结构就是数组。

  • 数组的键值可以是数字或者字符串
  • 数组的value可以使数字或者字符串
  • 同一个数组内可以既有数字又有字符串的键值
  • 数组的value不可以是数组
  • 数组支持多重键值, 其实是多个键值通过 SUBSEP 连接成为一个字符串。
###########################################################################
# Test multikey array

END {
    array[1,1,2,4] = 1; array[2] = 3 ;
    for( a  in array ) {
        printf(" The array[%s] = %d \n", a ,array[a]); 
        o = split(a,x,SUBSER);
        printf("Multikeys %d :", o );
        for(key in x){
            printf(" %s ",x[key]);
        }
        printf("\n");
    }
printf("length of array is %d \n",length(array)) ;
}

数组实验结果

  • for in 循环遍历数组
for( a in array) { print array[a]; }
  • 判断数组键值是否存在
if"test" in array ) { print array["test"] ; }
  • length 函数读取数组长度

  • delete 关键字删除数组/数组中数据

# delete 关键字必须对数组试用。变量不支持。
# delete 关键字仅仅删除数据内容, 数组的名字仍然被占用, 不可复用。
END { 
delete array ;  # 删除整个数组
delete array[0] ; # 删除一项数据

函数

BuildIn 函数

  • BUILT-IN ARITHMETIC FUNCTIONS
FUNCTIONVALUE RETURNED
atan2(y,x)arctangent of y / x in the range -3.1435926 to 3.1415926
cos(x)cosine of x, with x in radians x度的cos
exp(x)e 的 x 次方
int(x)integer part of x; truncated towards 0 when x > 0
log(x)natural (base e) logarithm of x
rand()random number r, where 0 <= r <= 1
sin(x)sine of x , with x in radians
sqrt(x)square root of x
srand(x)x is new seed for rand ( )
  • TABLE 2-7. BUILT-IN STRING FUNCTIONS
FUNCTIONDESCRIPTION
gsub(r,s)替换 s for r globally in $0, return number of substitutions made
gsub(r ,s ,t)替换 for r globally in string t, return number of substitutions made
index(s ,t)return first position of string t in s, or 0 if t is not present
length(s)return number of characters in s
match(s ,r)test whether s contains a substring matched by r, return index or 0; sets RSTART and RLENGTH
split(s ,a)split s into array a on FS, return number of fields
split(s ,a ,fs)splits into array a on field separator fs, return number of fields
sprintf(fmt , expr -list )return expr -list formatted according to format string fmt
sub(r ,s)substitutes for the leftmost longest substring of $0 matched by r, return number of substitutions made
sub(r ,s ,t)substitute s for the leftmost longest substring oft matched by r, return number of substitutions made
substr (s ,p)return suffix of s starting at position p
substr (s ,p ,n)return substring of s of length n starting at position p

自定义函数

function name (parameter-list )
 statements 
}
  • 函数的参数是local的变量
  • 函数的返回值通过return 返回
  • 函数调用和C一样

system 函数可以调用Shell命令

输出

  • print 类似 shell 的 echo
{
print $0 ;
}
  • printf 类似C的printf
printf("S3 = %s \n",S3);
  • 使用 > 和 >> 符号来进行重定向输出。
printf("S3 = %s \n",S3) > "outfile";
  • 使用 | 来利用管道
# | 后面必须是字符串类型变量。 表示接收数据的命令。
print $0 | "sort"
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值