python统计文本文件的字数

最新推荐文章于 2024-06-06 22:28:42 发布

转载最新推荐文章于 2024-06-06 22:28:42 发布 · 6k 阅读

-之-code 同时被 2 个专栏收录

42 篇文章

订阅专栏

python

29 篇文章

订阅专栏

本文提供了一段Python代码，用于统计指定文本文件中各单词出现的次数，并按频率排序输出。该程序支持去除标点符号并能显示单词净个数。

转的，出处找不到了。

还有这个也不错：https://code.google.com/p/pyzh/

统计文本文件的字数，从当前目录下的file.txt取文件

# -*- coding: GBK -*-
import string
import sys
reload(sys)

def compareItems((w1,c1), (w2,c2)):
    if c1 > c2:
        return - 1
    elif c1 == c2:
        return cmp(w1, w2)
    else:
        return 1

def main():
    fname = "file.txt"
    
    try:
        text = open(fname,'r').read()
        text = string.lower(text)
    except:
        print "\nfile.txt is not exist!!! or There is a R/W error! "
        sys.exit()

    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        text = string.replace(text, ch, ' ')
    words = string.split(text)

  
    counts = {}
    for w in words:
        counts[w] = counts.get(w,0) + 1
    
    n = input("\n输入要统计的top单词数:")
    items = counts.items()
    
    items.sort(compareItems)
    
    max = len(items)
    print "\n单词总计:" + str(len(words))
    print "单词净个数(已去重):" + str(max)
    print "\n"
    if n > max:
        n = max
    for i in range(n):
        print "%-10s%5d" % items[i]

if __name__ == '__main__': 
    main()