192到195题

Bash脚本实战技巧
本文介绍了使用Bash脚本来解决常见文本处理任务的方法,包括单词频率统计、验证电话号码格式、文件转置及特定行的提取等。通过具体实例展示了如何利用Bash内置命令高效完成工作。

192. Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ' ' characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

For example, assume that words.txt has the following content:

the day is sunny the the
the sunny is is
Your script should output the following, sorted by descending frequency:
the 4
is 3
sunny 2
day 1

Note:
Don't worry about handling ties, it is guaranteed that each word's frequency count is unique.


统计各个单词的出现的次数。先用tr把单词都分成单独的行。然后用gawk统计每个单词的个数,最后用sort根据第二个值(也就是个数)从大到小排序:

cat $1 | tr -s ' ' '\n' | gawk '{count[$1]++}END{for(word in count) print word,count[word]}' | sort -rn -k2



193. Valid Phone Numbers

Given a text file file.txt that contains list of phone numbers (one per line), write a one liner bash script to print all valid phone numbers.

You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)

You may also assume each line in the text file must not contain leading or trailing white spaces.

For example, assume that file.txt has the following content:

987-123-4567
123 456 7890
(123) 456-7890
Your script should output the following valid phone numbers:
987-123-4567
(123) 456-7890

Subscribe to see which companies asked this question.


判断给出的号码是否是合法的号码。用gawk配合正则表达式来匹配:

cat file.txt | gawk --re-interval '/^(\([0-9]{3}\)[ ]|[0-9]{3}-)[0-9]{3}-[0-9]{4}$/{print $0}'


194. Transpose File

Given a text file file.txt, transpose its content.

You may assume that each row has the same number of columns and each field is separated by the ' ' character.

For example, if file.txt has the following content:

name age
alice 21
ryan 30

Output the following:

name alice ryan
age 21 30

Subscribe to see which companies asked this question.


相当于矩阵的转置,用line数组记录下每一列,然后输出每一列即可。这里要注意的是每一行最后不能有空格:

gawk '{
		for(i=1; i<=NF; ++i)
		{
			if(line[i] == "")
			{
				line[i]=$i
			}
			else
			{
				line[i]=line[i]" "$i;
			}
		}
	}
	END{
		for(i=1; i<=NF; ++i)
		{
			print line[i]
		}
	}
	' file.txt


195. Tenth Line

How would you print just the 10th line of a file?

For example, assume that file.txt has the following content:

Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Your script should output the tenth line, which is:
Line 10

[show hint]

Subscribe to see which companies asked this question.


输出文件中的第十行,用sed可以很方便的实现:

sed -n '10p' file.txt





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值