实验六:组合数据类型应用练习
一、实验类型:设计型
二、建议学时:4
三、实验要求:
1、阅读教材第6章:组合数据类型
2、验证型练习:
- 2.1、验证序列类型、集合类型、列表类型、字典类型的相关操作方法和函数(教材 P156-167 表6.1~表6.5) (注:该部分练习不用提交)
- 2.2、验证6.5 jieba库常用的分词函数(表6.6)(注:该部分练习不用提交)
- 2.3、验证6.6 实例10:文本词频统计中实例代码10.1-10.4
3、设计型练习:
教材第6章中的程序练习题选: 选做习题6.1、6.2、6.4、6.6。
- hamlet.txt、三国演义.txt下载见资源链接
- 验证性练习:验证6.6 实例10:文本词频统计中实例代码10.1-10.4
10.1、
#e10.1CalHamlet.py
def getText():
txt=open("hamlet.txt","r").read()
txt=txt.lower()
for ch in '!"#$%&()*+,-./;<=>?@[\\]^_`{|}~':
txt=txt.replace(ch," ")
return txt
hamletTxt=getText()
words=hamletTxt.split()
counts={}
for word in words:
counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(10):
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))
验证结果:
10.2、
#e10.2CalHamlet.py
excludes={"the","and","of","you","a","i","my","in"}
def getText():
txt=open("hamlet.txt","r").read()
txt=txt.lower()
for ch in '!"#$%&()*+,-./;<=>?@[\\]^_`{|}~':
txt=txt.replace(ch," ")
return txt
hamletTxt=getText()
words=hamletTxt.split()
counts={}
for word in words:
counts[word]=counts.get(word,0)+1
for word in excludes:
del(counts[word])
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(10):
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))
验证结果:
注:hamlet.txt
10.3、
#e10.3CalThreeKingdoms,py
import jieba
txt=open("三国演义.txt","r",encoding='utf-8').read()
worda=jieba.lcut(txt)
counts={}
for word in words:
if len(word)==1:
continue
else:
counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(15):
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))
10.4、
#e10.4CalThreeKingdoms,py
import jieba
excludes={"将军","却说","荆州","二人","不可","不能","如此"}
txt=open("三国演义.txt","r",encoding='utf-8').read()
worda=jieba.lcut(txt)
counts={}
for word in words:
if len(word)==1:
continue
elif word=="诸葛亮" or word=="孔明曰":
rword="孔明"
elif word=="关公" or word=="云长":
rword="关羽"
elif word=="玄德" or word=="玄德曰":
rword="刘备"
elif word=="孟德" or word=="丞相":
rword="曹操"
else:
rword=word
counts[rword]=counts.get(rword,0)+1
for word in excludes:
del(counts[word])
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(5):
word,count=items[i]
print("{0:<10}{1:>5}".format(word,count))
注:三国演义.txt
- 设计型练习(第六章程序练习题):
6.1、
from random import randint
L=['0','1','2','3','4','5','6','7','8','9']
for i in range(65,123):
if i<=96 and i>=91:
continue
L.append(chr(i))
for i in range(10):
str=''
for j in range(8):
str+=L[randint(0,len(L)-1)]
print("第",i+1,"个密码是",str)
输出结果:
6.2、
def repeat(List):
for i in List:
if(List.count(i)>1):
return True
return False
def main():
s=input("请输入列表元素(以逗号,分隔每个元素):")
L=s.split(",")
if repeat(L):
print("存在元素出现了不止一次")
else:
print("每个元素都只出现了一次")
main()
测试结果:
1.
2.
6.4、
string=input("请输入字符串:")
counts={}
for x in string:
counts[x]=counts.get(x,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(len(items)):
x,count=items[i]
print(x)
测试结果:
英文文章中空格和元音字母e出现频率比较高,中文文章标点符号的频率高
6.6、
代码1:
import jieba
txt=open("红楼梦.txt","r",encoding='utf-8').read()
worda=jieba.lcut(txt)
counts={}
for word in words:
if len(word)==1:
continue
else:
counts[word]=counts.get(word,0)+1
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
word,count=items[i]
print(word)
多字词语中有一些不是人名,所以从字典中删除一些非人名高频词语
优化后的代码2:
import jieba
txt=open("红楼梦.txt","r",encoding='utf-8').read()
excludes={'什么','一个','我们','你们','如今','说道','知道','起来','这里','姑娘','出来','众人','那里','自己'}
worda=jieba.lcut(txt)
counts={}
for word in words:
if len(word)==1:
continue
else:
counts[word]=counts.get(word,0)+1
for word in excludes:
del counts[word]
items=list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
word,count=items[i]
print(word)