要求:文件每行一个单词,统计单次出现的频率(次数+单词),按频率降序排列
=======================================================================================
python:
把文件读成列表,写成字典d(单词:次数),按照字典的值降序排列
sorted(d.items(), key = lambda a: a[1], reserse = True)
wlist = ['a','b','b','a','b','c']
print wlist
d = {}
#------------------方法一----------------
for w in wlist:
if w not in d.keys():
count = wlist.count(w)
d.setdefault(w,count)
print sorted(d.items(), key = lambda p:p[1], reverse = True)#p[1]True是按值倒序排列
#------------------方法二----------------
myset = set(wlist)
for w in myset:
count = wlist.count(w)
d.setdefault(w,count)
print sorted(d.items(), key = lambda p:p[1], reverse = True)
=======================================================================================
shell:
[root@CentOS lxg]# cat count.txt
a
b
c
b
b
a
[root@CentOS lxg]# cat count.txt | sort | uniq -c //-c 参数显示文件中每行连续出现的次数,所以要先排序
2 a
3 b
1 c
[root@CentOS lxg]# awk '{p[$1]++}END{for(a in p) print p[a],a}' count.txt
2 a
3 b
1 c
[root@CentOS lxg]#
=======================================================================================
思考:
[root@CentOS lxg]# cat word.txt
a a c
b a
c c
c
[root@CentOS lxg]# str=$(cat word.txt)
[root@CentOS lxg]# echo $str
a a c b a c c c
[root@CentOS lxg]# echo $str | awk -F'a' '{print NF-1}'
3
[root@CentOS lxg]# echo $str | grep -o 'a' | wc -l
3
[root@CentOS lxg]# echo $str | tr "a" "\n" | wc -l
4
[root@CentOS lxg]# echo $(($(echo $str | tr "a" "\n" | wc -l)-1))
3
[root@CentOS lxg]# echo $str | tr " " ""
tr: 当不截断设置1 时,字符串2 不能为空
[root@CentOS lxg]# echo $str | tr -d " "
aacbaccc
[root@CentOS lxg]# for w in $str; do echo $w; done | sort | uniq -c
3 a
1 b
4 c
[root@CentOS lxg]# echo $str | tr " " "\n" | sort | uniq -c
3 a
1 b
4 c
[root@CentOS lxg]# echo $str | tr " " "\n" | sort | uniq -c | sort -r
4 c
3 a
1 b
[root@CentOS lxg]#