Google Python Class 之——带参数的WordCount 实现

本文介绍了一个基于Google Python课程的小练习项目,主要实现单词计数功能。通过读取文件,统计每个单词出现的次数,并按不同需求输出单词及其出现次数。提供了两种输出方式:一种是按字母顺序列出所有单词的计数;另一种是输出出现频率最高的前20个单词。
#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""Wordcount exercise
Google's Python class

The main() below is already defined and complete. It calls print_words()
and print_top() functions which you write.

1. For the --count flag, implement a print_words(filename) function that counts
how often each word appears in the text and prints:
word1 count1
word2 count2
...

Print the above list in order sorted by word (python will sort punctuation to
come before letters -- that's fine). Store all the words as lowercase,
so 'The' and 'the' count as the same word.

2. For the --topcount flag, implement a print_top(filename) which is similar
to print_words() but which prints just the top 20 most common words sorted
so the most common word is first, then the next most common, and so on.

Use str.split() (no arguments) to split on all whitespace.

Workflow: don't build the whole program at once. Get it to an intermediate
milestone and print your data structure and sys.exit(0).
When that's working, try for the next milestone.

Optional: define a helper function to avoid code duplication inside
print_words() and print_top().

"""

import sys
from operator import itemgetter

# +++your code here+++
# Define print_words(filename) and print_top(filename) functions.
# You could write a helper utility function that reads a file
# and builds and returns a word/count dict for it.
# Then print_words() and print_top() can just call the utility function.

###


def print_words(filename):
    dict_words = {}
    f = open(filename, 'rU')
    for line in f:
        list_word = line.split()
        for a in list_word:
            a = a.lower()
            if a in dict_words:
                dict_words[a] += 1
            else:
                dict_words[a] = 1
    f.close()
    print dict_words
    sorted_dict_words = sorted(dict_words.items(), key=itemgetter(0))

    print sorted_dict_words
    for key in sorted_dict_words:
        print key[0]+" "+str(key[1])

    # sys.exit(0)


def print_top(filename):
    dict_words = {}
    f = open(filename, 'rU')
    for line in f:
        list_word = line.split()
        a = a.lower()
        for a in list_word:
            if a in dict_words:
                dict_words[a] += 1
            else:
                dict_words[a] = 1
    f.close()
    print dict_words
    sorted_dict_words = sorted(dict_words.items(), key=itemgetter(1), reverse=True)

    print sorted_dict_words
    for key in sorted_dict_words[:20]:
        print key[0]+" "+str(key[1])

    # sys.exit(0)


# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.
def main():
    if len(sys.argv) != 3:
        print 'usage: ./wordcount.py {--count | --topcount} file'
        sys.exit(1)

    option = sys.argv[1]
    filename = sys.argv[2]
    if option == '--count':
        print_words(filename)
    elif option == '--topcount':
        print_top(filename)
    else:
        print 'unknown option: ' + option
        sys.exit(1)

if __name__ == '__main__':
  main()

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值