作业来源:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2753
1.列表,元组,字典,集合分别如何增删改查及遍历。
列表:
元组:
增、查、遍历:
改:
删:
字典:
集合:
2.总结列表,元组,字典,集合的联系与区别。参考以下几个方面:
括号:
列表用“[]”,元组用“()”,字典和集合用“{}”;
有序无序:
列表和元组有序,字典和集合无序;
可变不可变:
列表、字典和集合可变,元组不可变;
重复不可重复:
列表和元组可重复,字典键不可重复,值可以重复,集合不可重复;
存储与查找方式:
列表以值的方式存储为值,可通过索引查找;
元组以值的方式存储为值,可通过索引查找;
字典以键值对的方式存储为值,一般通过键查找;
集合以值的方式存储为值,可以通过set()来将序列和字典转换为集合。
3.词频统计
-
1.下载一长篇小说,存成utf-8编码的文本文件 file
2.通过文件读取字符串 str
3.对文本进行预处理
4.分解提取单词 list
5.单词计数字典 set , dict
6.按词频排序 list.sort(key=lambda),turple
7.排除语法型词汇,代词、冠词、连词等无语义词
- 自定义停用词表
- 或用stops.txt
8.输出TOP(20)
- 9.可视化:词云
排序好的单词列表word保存成csv文件
import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')
线上工具生成词云:
https://wordart.com/create
代码:
import string
import pandas as pd
def start():
f = open("There For You.txt", "r", encoding='utf-8')
novel = f.read()
f.close()
novel = novel.lower()
for x in string.punctuation:
novel = novel.replace(x, " ")
novel = novel.split()
txt = open("stops.txt", "r", encoding='utf-8')
stopWords = txt.read()
txt.close()
for c in {"\n","'"}:
stopWords = stopWords.replace(c," ")
stopWords = stopWords.split()
wordsSet = set(novel) - set(stopWords)
wordsCount = {}
for i in wordsSet:
wordsCount[i] = novel.count(i)
top = list(wordsCount.items())
top.sort(key=lambda x: x[1], reverse=True)
pd.DataFrame(data=top[0:20]).to_csv('There For You.csv', encoding='utf-8')
start()