Classical 10 Examples for learning AWK

最新推荐文章于 2025-11-30 15:29:06 发布

最新推荐文章于 2025-11-30 15:29:06 发布 · 138 阅读

文章标签：

#awk #c/c++ #shell

本文提供了一篇关于使用AWK处理文本文件的深入指南，通过10个示例展示了如何进行字段格式化、记录选择、条件判断等操作。包括使用正则表达式、比较运算符、变量、内部变量、注释、关联数组和数据处理等功能。

Classical 10 Examples for learning AWK

作者：柳大·Poechant（钟超）
邮箱：zhongchao.ustc#gmail.com（# -> @）
博客：Blog.youkuaiyun.com/Poechant
日期：June 9^th, 2012

Example 1: Formatting Fields Into Columns

countries文件：

Canada:3852:25:North America
USA:3615:237:North America
Brazil:3286:134:South America
England:94:56:Europe
France:211:55:Europe
Japan:144:120:Asia
Mexico:762:78:North America
China:3705:1032:Asia
India:1267:746:Asia

What do you want?

将 countries 文件的每一行按照指定格式输出。

Command

awk -F: '{ printf "%-10s\t %d\t %d\t %15s\n",$1,$2,$3,$4 }' countries

Result:

Canada      3852    25        North America 
USA         3615    237       North America 
Brazil      3286    134       South America 
England     94  56                   Europe 
France      211     55               Europe 
Japan       144     120                Asia 
Mexico      762     78        North America 
China       3705    1032               Asia 
India       1267    746                Asia

Analysis:

-F后面接分隔符，这里是冒号。
对齐由正负号表示，不显式指明，则继承之前的对齐方式。
字符串是s，同 standard c 的 stream format。
整数是d，同 standard c 的 stream format。
正则表达式，用''单引号括起来，再用{}花括号括起来，具体的后面会了解。

Example 2: Selecting Records

countries文件：同 Example 1。

What do you want?

含有`Europe`关键词的行全部输出。

Command:

 awk '/Europe/' countries

Result:

England:94:56:Europe
France:211:55:Europe

Analysis:

包含Europe的关键词的行。
必须用//括起来。

Eample 3: Comparators

countries文件：同 Example 2。

What do you want?

第三列值为 55 的行全部输出。

Command:

 awk -F: '$3 == 55' countries

Result:

France:211:55:Europe

Analysis：

$#表示指定某一列。
==比较运算符可用。其他比较运算符：
- !=not equal to
- >greater than
- <less than
- >=greater than or equal to
- <=less than or equal to

Exmaple 4: Using Logical Operators (and, or) to create multiple conditions

cars文件：

ford     mondeo  1990   5800
ford     fiesta  1991   4575
honda    accord  1991   6000
toyota   tercel  1992   6500
vaxhaull astra   1990   5950
vaxhaull carlton 1991   6450

Command:

awk '$3 >=1991 && $4 < 6250' cars

Result:

ford     fiesta  1991   4575
honda    accord  1991   6000

Analysis:

&&
||

Example 5: How to run an AWK program file?

Input file:carsin the above example.

Program file:hello

#!/usr/bin/awk
{
    x = "hello"
    print x
}

Command:

awk -f hello cars

Result:

hello
hello
hello
hello
hello
hello

Analysis:

Outputhelloinstead of each record(line).

Reference:

http://www.delorie.com/gnu/docs/gawk/gawk_11.html

Example 6: Boost up your AWk program: Use Variables!

Input file:countriesin the example 4

Program file:

#!/usr/bin/awk
{
    x = "hello"
    print x
}

Command:

awk -f hello countries

Result:

USA:3615:237:North America
Brazil:3286:134:South America
England:94:56:Europe
France:211:55:Europe
Japan:144:120:Asia
Mexico:762:78:North America
China:3705:1032:Asia
India:1267:746:Asia

Analysis:

$0means the current record

Example 7: Use internal variables!

Input file:countriesin the example 4

Program file:hello

#!/usr/bin/awk                                                                                                              

{
    print FILENAME OFS \
        NR OFS \
        $1 OFS \
        $2 OFS \
        $3 OFS \
        $4 OFS \
        ORS 
}

Command:

awk -F: -f hello countries

Result:

countries 1 Canada 3825 25 North merica 

countries 2 USA 3615 237 North America 

countries 3 Brazil 3286 134 South America 

countries 4 England 94 56 Europe 

countries 5 France 211 55 Europe 

countries 6 Japan 144 120 Asia 

countries 7 Mexico 762 78 North America 

countries 8 China 3705 1032 Asia 

countries 9 India 1267 746 Asia

Analysis:

$0the current record
FILENAMEthe filename of the current input file
NFnumber of fields in the current record
NRrecord number of the current record
$#fields in the current record
FSinput field seperator (default is SPACE or TAB)
OFSoutput field seperator (default is SPACE)
RSinput record seperator (default is NEWLINE)
ORSoutput record seperator (default is NEWLINE)

Example 8: How to write comments in AWK program file?

Input file:countries

Program file:hello

{
    # Just test the AWK comment
    print $0
}

Command:

awk -F: -f hello countries

Result:

Canada:3852:25:North America
USA:3615:237:North America
Brazil:3286:134:South America
England:94:56:Europe
France:211:55:Europe
Japan:144:120:Asia
Mexico:762:78:North America
China:3705:1032:Asia
India:1267:746:Asia

Example 9: Associative Arrays? Yes!

Input file:countries

Program file:hello

#!/usr/bin/awk
{
    capitals["China"]="Beijing"
    print capitals["China"]
}

Command:

awk -F: f hello countries

Result:

Beijing
Beijing
Beijing
Beijing
Beijing
Beijing
Beijing
Beijing
Beijing

Analysis:

How to traverse the associate array?

  for (i in capitals)
      print i OFS capitals[i]

Example 10: Data Processing and Arithmetic

Input file:countries

Program file:hello

#!/usr/bin/awk                                                                                                              
# Demonstration awk program file

BEGIN {
    hours = 0 
    gross = 0 
    tax = 0 

    print "NAME             RATE           HOURS        GROSS             TAX\n"
  }   

{
    printf "%-10s \t%8.2f \t%d \t%10.2f\t%10.2f \n", $1, $2, $3, $2*$3, $2*$3*0.25
}

END {
    hours += $3
    gross += ($2 * $3) 
    tax   += ($2 * $3) * 0.25

    printf "\nTOTALS:\t\t\t\t%d \t%.2f\t%.2f \n", hours, gross, tax 
}

Result:

NAME            RATE    HOURS        GROSS         TAX

Canada       3825.00    25        95625.00    23906.25 
USA          3615.00    237      856755.00   214188.75 
Brazil       3286.00    134      440324.00   110081.00 
England        94.00    56         5264.00     1316.00 
France        211.00    55        11605.00     2901.25 
Japan         144.00    120       17280.00     4320.00 
Mexico        762.00    78        59436.00    14859.00 
China        3705.00    1032    3823560.00   955890.00 
India        1267.00    746      945182.00   236295.50 

TOTALS:                 2483    6255031.00  1563757.75

Analysis: