如何提取一篇英文文章的所有单词

最新推荐文章于 2021-11-23 14:39:50 发布

原创最新推荐文章于 2021-11-23 14:39:50 发布 · 5.7k 阅读

3 ·

CC 4.0 BY-SA版权

文本处理专栏收录该内容

2 篇文章

订阅专栏

本文介绍了一种使用Python进行文本处理的方法，通过对文件中的单词进行清洗、转换为小写并去除重复项来统计单词数量。

部署运行你感兴趣的模型镜像

import string
fin = open(filename) 
words = []
count = 0
for line in fin:
	line = line.replace('-',' ')
	for word in line.split():
		word = word.strip(string.punctuation + string.whitespace)
		word = word.lower()
		if word not in words:
			words.append(word)
			count = count+1
			print count , ' ' , word

您可能感兴趣的与本文相关的镜像