摘自 Robbins A., Beebe N. - Classic Shell Scripting - 2005
Chapter 5.
Problem:
Given a text file and an integer n, you are to print the words (and their frequencies of occurrence) whose frequencies of occurrence are among the n largest in order of decreasing frequency.(找到一个文档中出现次数最多的n哥单词,并显示他们的出现次数)
McIlroy’s program illustrates the power of the Unix tools approach: break a complex problem into simpler parts that you already know how to handle. To solve the
word-frequency problem, McIlroy converted the text file to a list of words, one per
line (tr does the job), mapped words to a single lettercase (tr again), sorted the list
(sort), reduced it to a list of unique words with counts (uniq), sorted that list by
descending counts (sort), and finally, printed the first several entries in the list (sed,
though head would work too).
Example 5-5. Word-frequency filter
#! /bin/sh
# Read a text stream on standard input, and output a list of
# the n (default: 25) most frequently occurring words and
# their frequency counts, in order of descending counts, on
# standard output.
#
# Usage:
# wf [n]
tr -cs A-Za-z\' '\n' | Replace nonletters with newlines
tr A-Z a-z | Map uppercase to lowercase
sort | Sort the words in ascending order
uniq -c | Eliminate duplicates, showing their counts
sort -k1,1nr -k2 | Sort by descending count, and then by ascending word
sed ${1:-25}q Print only the first n (default: 25) lines; see Chapter 3