综合练习:词频统计

1.英文词频统

下载一首英文的歌词或文章

将所有,.?!’:等分隔符全部替换为空格

news = '''
歌手:Avril Lavigne(艾薇儿)
歌词出处:http://www.5nd.com

╰☆╮Avril Lavigne - Smile╰☆╮
Lyrics by Judy @ LK歌词组 QQ群:43882929
You know that I'm a crazy bitch
I do what I want, when I feel like it
All I wanna do is lose control, oh oh
But you don't really give a shit
Ya go with it, go with it, go with it
'Cause you're fuckin' crazy Rock 'N' Roll
You-ou said "hey! what's your name?"
It took one look and now I'm not the same
Yeah, you said "Hey"
And since that day
You stole my heart and you're the one to blame
Yeahhh and that's why I smile
It's been a while
Since everyday and everything has felt this right
And now, you turn it all around
And suddenly you're all I need the reason why
I, I, I, I smile, ile, ile, ile
Last night I blacked out I think
What did you, what did you, put in my drink?
I remember making out and then oh, oh
I woke up with a new tattoo
Your name was on me and my name was on you
I would do it all over again
You-ou said "hey what's your name?"
It took one look and now I'm not the same
Yeah, you said "Hey" (Hey)
And since that day (and since that day)
You stole my heart and you're the one to blame
Yeahhh and that's why I smile
It's been a while
Since everyday and everything has felt this right
And now, you turn it all around
And suddenly you're all I need the reason why
I, I, I, I smile, ile, ile, ile
The reason why I, I, I, I smile, ile, ile, ile
You know that I'm a crazy bitch
I do what I want, when I feel like it
All I wanna do is lose control
You know that I'm a crazy bitch
I do what I want, when I feel like it
All I wanna do is lose control
And that's why I smile
It's been a while
Since everyday and everything has felt this right
And now, you turn it all around
And suddenly you're all I need the reason why
I, I, I, I smile, ile, ile, ile (the reason why)
The reason why I, I, I, I smile, ile, ile, ile
The reason why I, I, I, I smile, ile, ile, ile
【 Avril Lavigne - Smile 】
Lrc edited by Judy @ LK 歌词组
'''

sep = ''',.?!'":;,。?!:“”'''
exclude = {'the','and','of','to'}

for c in sep:
    news = news.replace(c,' ')

  

将所有大写转换为小写,生成单词列表

wordList = news.lower().split()
for w in wordList:
    print(w)

生成词频统计

wordDist = {}
wordSet = set(wordList)
for w in wordSet:
    wordDist[w] = wordList.count(w)
 
for w in wordDist:
    print(w, wordDist[w])

  

排序

dictList = list(wordDict.items())
dictList.sort(key=lambda x:x[1],reverse=True)

  

排除语法型词汇,代词、冠词、连词

exclude = {'the','of','and','s','to','which','will','as','on','is','by',}

wordSet=set(wordList)-exclude
for w in wordSet:
    wordDist[w]=wordList.count(w)

  

输出词频最大TOP20

for i in range(20):
   print(dictList[i])

  

将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。

 

f = open('songs.txt','r',encoding='UTF-8')
news = f.read()
f.close()
print(news)

将排序结果放在songscount.txt文件中:

f = open('songscount.txt','a')
for i in range(20):
    f.write(dictList[i][0]+' '+str(dictList[i][1])+'\n')
f.close()

  

 

 

2.中文词频统计

下载一长篇中文文章。

从文件读取待分析文本。

news = open('gzccnews.txt','r',encoding = 'utf-8')

安装与使用jieba进行中文分词。

pip install jieba

import jieba

list(jieba.lcut(news))

import jieba
file=open('hong.txt','r',encoding='utf-8')
word=file.read()
file.close()

  

生成词频统计

wordList=list(jieba.cut_for_search(word))
  
wordDist={}
for w in wordList:
    wordDist[w] = wordList.count(w)
  
for w in wordDist:
    print(w, wordDist[w])

  

排序

dictList = list(wordDist.items())
dictList.sort(key = lambda x: x[1], reverse=True)

  

排除语法型词汇,代词、冠词、连词

sep=''',。?“”:、?;!!'''
 
exclude ={' ','\n','了','的','\u3000','他','我','也','又','是','你','着','这','就','都','呢','只'}
 
for c in sep:
    word = word.replace(c,' ')
 
wordSet=set(wordList)-exclude

  

输出词频最大TOP20(或把结果存放到文件里)

 

f=open('hongcount.txt','a')
for i in range(20):
    f.write(dictList[i][0]+' '+str(dictList[i][1])+'\n')
f.close()

  

 

转载于:https://www.cnblogs.com/oechen/p/8666418.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值