可以下载一长篇的英文小说,进行词频的分析。
1.读入待分析的字符串
2.分解提取单词
3.计数字典
4.排除语法型词汇
5.排序
6.输出TOP(20)
s=open('book.txt','w') s.write('''New year is the great moment for people, and many families choose to go to the cinema and enjoy the hour. But recently, the news reported an unhappy incident that a woman was talking loudly while watching movie and an audience beat her for anger. The public criticized the woman’s impolite behavior, though the audience was rude. The impolite behavior in the cinema happens all the time. When watching the movie, I really hate people talk, or the kids share opinions with adults. They are disturbing the audience. Some people don’t talk, but they play smart phone, showing a light in the dark, it is very uncomfortable. Everybody goes to the movie to take relax, the one who doesn’t control their behavior will disturb others. It is everybody’s duty to self-behave. Parents need to educate their children, or set the good example to them. Foreigners always complain about the rude behavior on Chinese people. We have to admit our rude act, only in this way can we get improved. ''') s.close() print('读取book.txt文件,并将其转化为列表形式提取单词') b=open('book.txt','r') read=b.read() b.close() read=read.lower() for i in ',.!?:': read=read.replace(i,' ') words=read.split(' ')#提取单词 print(words) print('集合转为字典排除语法型词汇并计数字典:') exp={'','and','the','to'} keys=set(words)-exp #键的集合,排除词法型词汇 print(keys) print('排序:') dic={} for w in keys: dic[w]=words.count(w)#单词计数字典 wc=list(dic.items()) #单词计数元组的列表 wc.sort(key=lambda x:x[1],reverse=True)#列表排序 print(wc) print('输出TOP(20):') for i in range(20): print(wc[i])
7.对输出结果的简要说明。
这篇英语文章讲述了要文明观看电影