simple awk tutorial

本文介绍AWK的基础用法,包括处理文本文件、算术运算、变量使用等,并通过实例展示如何结合其他工具进行复杂的数据处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

转自:http://www.hcs.harvard.edu/~dholland/computers/awk.html

 

 

simple awk tutorial

why awk?

awk is small, fast, and simple, unlike, say, perl. awk also has a clean comprehensible C-like input language, unlike, say, perl. And while it can't do everything you can do in perl, it can do most things that are actually text processing, and it's much easier to work with.

what do you do?

In its simplest usage awk is meant for processing column-oriented text data, such as tables, presented to it on standard input. The variables $1, $2, and so forth are the contents of the first, second, etc. column of the current input line. For example, to print the second column of a file, you might use the following simple awk script:

        awk < file '{ print $2 }'

This means "on every line, print the second field".

To print the second and third columns, you might use

        awk < file '{ print $2, $3 }'
Input separator

By default awk splits input lines into fields based on whitespace, that is, spaces and tabs. You can change this by using the -F option to awk and supplying another character. For instance, to print the home directories of all users on the system, you might do

        awk < /etc/passwd -F: '{ print $6 }'

since the password file has fields delimited by colons and the home directory is the 6th field.

Arithmetic

Awk is a weakly typed language; variables can be either strings or numbers, depending on how they're referenced. All numbers are floating-point. So to implement the fahrenheit-to-celsius calculator, you might write

        awk '{ print ($1-32)*(5/9) }'

which will convert fahrenheit temperatures provided on standard input to celsius until it gets an end-of-file.

The selection of operators is basically the same as in C, although some of C's wilder constructs do not work. String concatenation is accomplished simply by writing two string expressions next to each other. '+' is always addition. Thus

        echo 5 4 | awk '{ print $1 + $2 }'

prints 9, while

        echo 5 4 | awk '{ print $1 $2 }'

prints 54. Note that 

        echo 5 4 | awk '{ print $1, $2 }'

prints "5 4".

Variables

awk has some built-in variables that are automatically set; $1 and so on are examples of these. The other builtin variables that are useful for beginners are generally NF, which holds the number of fields in the current input line ($NF gives the last field), and $0, which holds the entire current input line.

You can make your own variables, with whatever names you like (except for reserved words in the awk language) just by using them. You do not have to declare variables. Variables that haven't been explicitly set  to anything have the value "" as strings and 0 as numbers.

For example, the following code prints the average of all the numbers on each line:

        awk '{ tot=0; for (i=1; i<=NF; i++) tot += $i; print	tot/NF; }'

Note the use of $i to retrieve the i'th variable, and the for loop, which works like in C. The reason tot is explicitly initialized at the beginning is that this code is run for every input line, and when starting work on the second line, tot will have the total value from the first line.

Blocks

It might seem silly to do that. Probably, you have only one set of numbers to add up. Why not put each one on its own line? In order to do this you need to be able to print the results when you're done. The way you do this is like this:

        awk '{ tot += $1; n += 1; }  END { print tot/n; }'

Note the use of two different block statements. The second one has END in front of it; this means to run the block once after all input has been processed. In fact, in general, you can put all kinds of things in front of a block, and the block will only run if they're satisfied. That is, you can say

        awk ' $1==0 { print $2 }'

which will print the second column for lines of input where the  first column is 0. You can also supply regular expressions to match the whole line against:

        awk ' /^test/ { print $2 }'

If you put no expression, the block is run on every line of input. If multiple blocks have conditions that are true, they are all run. There is no particularly clean way I know of to get it to run exactly one of a bunch of possible blocks of code.

The block conditions BEGIN and END are special and are run before processing any input, and after processing all input, respectively.

Other language constructs

As hinted at above, awk supports loop and conditional statements like in C, that is, for, while, do/while, if, and if/else.

printf

awk includes a printf statement that works essentially like C printf.  This can be used when you want to format output neatly or combine things onto one line in more complex ways (print implicitly adds a newline; printf doesn't.)

Here's how you strip the first column off:

        awk '{ for (i=2; i<=NF; i++) printf "%s ", $i; printf "\n"; }'

Note the use of NF to iterate over all the fields and the use of  printf to place newlines explicitly.

Combining awk with other tools

The strength of shell scripting is the ability to combine lots of tools together. Some things are most readily done with successive awk processes pipelined together. awk is also often combined with the sed utility, which does regular expression matching and substitution. You can actually do most common sed operations in awk, but it's usually more convenient to use sed.

In combination with sed, sort, and some other useful shell utilities like paste, awk is quite satisfactory for a lot of numerical data processing, as well as the maintenance of simple databases kept in column-tabular formats.

awk is also extremely useful in a certain style of makefile writing for generating pieces of makefile to include.

Do not script in csh. Use sh, or if you must, ksh.

More stuff

This introduction intentionally covers only the absolute basics of awk. awk can do lots of other useful and powerful things. The manual page for GNU awk (gawk) is a reasonably good reference for looking things up once you know the basics.

The only thing seriously lacking in awk that I've yet run into is that there's no way to tell awk not to do buffering on its input and output, which makes it useless for assorted interactive or network-oriented applications it would otherwise be ideally suited for.

 

Awk入门教程。作者 Bruce Barnett 注:英文版。以下是目录 Why learn AWK? Basic Structure Executing an AWK script Which shell to use with AWK? Dynamic Variables The Essential Syntax of AWK Arithmetic Expressions Unary arithmetic operators The Autoincrement and Autodecrement Operators Assignment Operators Conditional expressions Regular Expressions And/Or/Not Commands AWK Built-in Variables FS - The Input Field Separator Variable OFS - The Output Field Separator Variable NF - The Number of Fields Variable NR - The Number of Records Variable RS - The Record Separator Variable ORS - The Output Record Separator Variable FILENAME - The Current Filename Variable Associative Arrays Multi-dimensional Arrays Example of using AWK's Associative Arrays Output of the script Picture Perfect PRINTF Output PRINTF - formatting output Escape Sequences Format Specifiers Width - specifying minimum field size Left Justification The Field Precision Value Explicit File output AWK Numerical Functions Trigonometric Functions Exponents, logs and square roots Truncating Integers "Random Numbers The Lotto script String Functions The Length function The Index Function The Substr function GAWK's Tolower and Toupper function The Split function NAWK's string functions The Match function The System function The Getline function The systime function The Strftime function User Defined Functions AWK patterns Formatting AWK programs Environment Variables ARGC - Number or arguments (NAWK/GAWK) ARGV - Array of arguments (NAWK/GAWK) ARGIND - Argument Index (GAWK only) FNR (NAWK/GAWK) OFMT (NAWK/GAWK) RSTART, RLENGTH and match (NAWK/GAWK) SUBSEP - Multi-dimensional array separator (NAWK/GAWK) ENVIRON - environment variables (GAWK only) IGNORECASE (GAWK only) CONVFMT - conversion format (GAWK only) ERRNO - system errors (GAWK only) FIELDWIDTHS - fixed width fields (GAWK only) AWK, NAWK, GAWK, or PERL
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值