几个shell脚本的面试题（一）

最新推荐文章于 2025-01-13 22:30:00 发布

原创最新推荐文章于 2025-01-13 22:30:00 发布 · 1.3w 阅读

25 ·

CC 4.0 BY-SA版权

文章标签：

#shell #面试题 #脚本 #linux

shell_script 专栏收录该内容

4 篇文章

订阅专栏

本文介绍了两个Shell脚本面试题目，包括计算文件中IP数量及差异，以及用户名出现次数和对应IP。通过diff、uniq、wc等命令进行文件操作，并给出了部分解决方案。

1. IpCount

如题：

某个目录下有两个文件a.txt和b.txt，文件格式为(ip username)，例如：

a.txt
127.0.0.1 zhangsan
127.0.0.1 wangxiaoer
127.0.0.2 lisi
127.0.0.3 wangwu    

b.txt
127.0.0.4 lixiaolu
127.0.0.1 lisi

每个文件至少有100万行，请使用linux命令行完成如下工作：

1）两个文件各自的ip数，以及总ip数
2）出现在b.txt而没有出现在a.txt的ip
3）每个username出现的次数，以及每个username对应的ip数

1.1 计算两个文件各自的ip数，以及总ip数

思路分析：

分别从两个文件中截取第一个字段，然后通过uniq命令去除重行，分别输入到ipA.txt和ipB.txt文件中

cat a.txt | awk ‘{print $1}’ | sort | uniq > ipA.txt
cat b.txt | awk ‘{print $1}’ | sort | uniq > ipB.txt

然后只要计算ipA和ipB文件的行数就可以了，这里使用wc命令，参数为 -l，表示出现的line数

wc -l ipA.txt
wc -l ipB.txt

这时我们只是分别计算出了a.txt 和 b.txt 的ip数，总的ip数需要再去重一次

cat ipA.txt ipB.txt > ip.txt
sort -u ip.txt | wc -l

所以，根据这些片段，我们可以把它组装成以个脚本，实现自动化^&^

当然，我们直接拿上面的语句拼装一下就可以了。

#! /bin/bash

cat a.txt | awk '{print $1}' | sort | uniq > ipA.txt

cat b.txt | awk '{print $1}' | sort | uniq > ipB.txt

numA=`wc -l ipA.txt | awk '{print $1}'`
numB=`wc -l ipB.txt | awk '{print $1}'`

echo -e "There are \e[1;34m$numA\e[0m ip in a.txt"
echo -e "There are \e[1;34m$numB\e[0m ip in b.txt"

cat ipA.txt ipB.txt > ip.txt
totalNum=`sort -u ip.txt | wc -l`

echo -e "There are total \e[1;34m$totalNum\e[0m ip"

运行流程如下：

[root@signal IpCount]# sh -x testing.sh 
+ uniq
+ sort
+ awk '{print $1}'
+ cat a.txt
+ sort
+ uniq
+ awk '{print $1}'
+ cat b.txt
++ awk '{print $1}'
++ wc -l ipA.txt
+ numA=3
++ awk '{print $1}'
++ wc -l ipB.txt
+ numB=2
+ echo -e 'There are \e[1;34m3\e[0m ip in a.txt'
There are 3 ip in a.txt
+ echo -e 'There are \e[1;34m2\e[0m ip in b.txt'
There are 2 ip in b.txt
+ cat ipA.txt ipB.txt
++ wc -l
++ sort -u ip.txt
+ totalNum=4
+ echo -e 'There are total \e[1;34m4\e[0m ip'
There are total 4 ip
[root@signal IpCount]#

运行结果为：
这里写图片描述
（我不知道怎么回事儿这个图片加了水印 ^q^）

再看这个脚本的时候就会发现，有很多重复的代码，有重复就意味我们可以重复利用。
我就不写那么详细了，大家可以在我的这些片段的基础上改善一下，考虑的因素多一点。

1.2 出现在b.txt而没有出现在a.txt的ip

思路分析：

diff命令

官方解释是这样的：

NAME:
diff - compare files line by line

SYNOPSIS:
diff [options]… file1 file2

我们可以尝试着比较一下 ipA.txt 和 ipB.txt，看到到底是怎样的：
这里写图片描述

再手动对比两个文件：
这里写图片描述

所以，这样一来我们就知道”>”和”<”表示的是什么了。
如图，“<”表示的是 ipA.txt 中的数据在ipB.txt 中没有的；”>”表示的是 ipB.txt 中的数据 ipA.txt 文件中没有的。

所以我们可以这样写：

diff ipA.txt ipB.txt | grep \>

那这样一来脚本就可以写出来了。

#! /bin/bash

diff ipA.txt ipB.txt | grep \> | awk '{print $2}'

^&^ 是不是感觉有点怪怪的，结果是拿到了，剩下的修饰就大家自己试试吧！

1.3 每个username出现的次数，以及每个username对应的ip数

乍一看这道题计算 username 出现的次数跟计算 ip 差不多，其实大错特错。这次计算不能去重，再审题一下。

好了，我差点进去一个坑，出来之后接着写吧。
思路分析：

获取到所有的 username ,汇总到 name.txt 文件中

cat a.txt b.txt | awk '{print $2}' > name.txt

逐行读取 name.txt …
这个没写出来，心疼十秒。

2. WordCount

如题：

给出一个文本 a.txt：比如 
http://aaa.com
http://bbb.com
http://bbb.com
http://bbb.com
http://ccc.com
http://ccc.com

让写 shell 统计，最后输出结果：
aaa 1
ccc 2
bbb 3

要求结果还要排序

这一看不就是 WordCount 吗，网上有写 mapper.sh 和 reducer.sh 来实现的。我先放出来：

mapper.sh

#! /bin/bash
while read LINE; do
  for word in $LINE
  do
    echo "$word 1"
    # in streaming, we define counter by
    # [reporter:counter:<group>,<counter>,<amount>]
    # define a counter named counter_no, in group counter_group
    # increase this counter by 1
    # counter shoule be output through stderr
    echo "reporter:counter:counter_group,counter_no,1" >&2
    echo "reporter:counter:status,processing......" >&2
    echo "This is log for testing, will be printed in stdout file" >&2
  done
done

reducer.sh

#! /bin/bash
count=0
started=0
word=""
while read LINE;do
  newword=`echo $LINE | cut -d ' '  -f 1`
  if [ "$word" != "$newword" ];then
    [ $started -ne 0 ] && echo "$word\t$count"
    word=$newword
    count=1
    started=1
  else
    count=$(( $count + 1 ))
  fi
done
echo "$word\t$count"

然后我一本正经的拿过去用了，结果发现，reducer.sh 中间有一句错了。然后我就各种去网上找其他的版本，结果发现都是一样的，而且都说是原创^L^。我始终没有搞懂 started 这个变量是干嘛的，于是我就放弃了这种方法。

就给大家看一下我的方法吧，我感觉写的有点偏题了。具体的大家往下看。

2.1 投机取巧之手动计算偏移量

我们需要先从原来的文本中截取到我们想要的那一部分，这次用到的命令是 cut：

NAME:
cut - remove sections from each line of files
SYNOPSIS:
cut options… [file] …

图片来自 linux命令大全，非原创

好了，暂时先放上来这么些个用法，其他的大家可以上Linux命令大全(手册) 网站去学习。

这里就有了我说的投机取巧，我们需要截取 a.txt 文件中每一行的 8 到 10 之间的内容，我们可以这样写：

[root@signal shellWordcount]# cut -c8-10 a.txt > tmp.txt

[root@signal shellWordcount]#

然后我们来看看 tmp.txt 文件中是否真的有我们想要的字段：
这里写图片描述

如愿以偿，我们第一步成功了！

2.2 投机取巧之差不多得了

接下来就要用到 sort 和 uniq 这两个命令了，在 IpCount 里边我们也有用到过，虽然我那会儿没有详细说明这两个命令，但是这会儿我也不会说的。

[root@signal shellWordcount]# cat tmp.txt | sort | uniq -c | sort
      1 aaa
      2 ccc
      3 bbb
[root@signal shellWordcount]#

笼统的弄到脚本中，就是：

#! /bin/bash
file=a.txt

cut -c8-10 $file > tmp.txt

cat tmp.txt \| sort \| uniq -c \| sort > result.txt

cat result.txt

运行流程如下：

[root@signal shellWordcount]# sh -x myStep.sh 
+ file=a.txt
+ cut -c8-10 a.txt
+ sort
+ uniq -c
+ sort
+ cat tmp.txt
+ cat result.txt
      1 aaa
      2 ccc
      3 bbb
[root@signal shellWordcount]#

这是不是就已经出来了，但是…人家是字母在前面，权重在后边呐…剩下的交给大家搞吧！无非就是什么什么，对吧！^o^

所以，大家又一次的被我坑了一把！当然，还有其他方法来实现，awk、sed、cut、wc、tr好多好多命令都是特别有用的，大家克隆一个虚拟机出来搞吧。