实现一个wordcount函数，统计英文字符串中每个单词出现的次数。返回一个字典，key为单词，value为对应单词出现的次数

qq_42427964

已于 2024-08-23 16:40:57 修改

阅读量419

点赞数 1

文章标签： python

于 2024-08-23 16:37:18 首次发布

本文链接：https://blog.youkuaiyun.com/qq_42427964/article/details/141467587

版权

请实现一个wordcount函数，统计英文字符串中每个单词出现的次数。返回一个字典，key为单词，value为对应单词出现的次数。

示例：
input:

"""Hello world!  
This is an example.  
Word count is fun.  
Is it fun to count words?  
Yes, it is fun!"""

输出

{'hello': 1, 'world': 1, 'this': 1, 'is': 4, 'an': 1, 'example': 1, 'word': 1, 'count': 2,
'fun': 3, 'it': 2, 'to': 1, 'words': 1, 'yes': 1}

编写代码需要用到的知识点：

.split(’ ') 返回的是list ( )里面的是分割符
‘’.join(a) 返回的是str ‘’l里面是拼接符号
哈希表:dict

 
        if s in hash_tabels:  
            hash_tabels[s] += 1  
        else:  
            hash_tabels[s] = 1

dict.get

if dict.get(key) ==None:
	dict[key] =1
else:
	dict[key]+=1
	
用 defaultdict(int) 来简化字典的计数过程。在遍历每个单词时，直接增加其计数

  初始化 
  word_count = defaultdict(int) 
  #赋值并统计次数 
  for word in words:  
        word_count[word] += 1

正则化

#去掉标点符号 替换为''
re.sub(r'[^\w\s]', '', text) 
#小写
.lower()  
#去除连续空格 只保留一个空格
re.sub(r'\s+', ' ', text)

去除’\n’

replace("\n", " ")

思路：

单词是
每一行空格后有标点符号 + ‘\n’
或者没有标点只有空格+ ‘\n’或者’\n’

如果开头是’\n’：

if input_str.startswith('\n'):

’\n’变为

如果末尾是单词+标点+‘\n’+单词（下一行)：去除标点后得到’\n’：‘\n’替换为空格后->如果换行符后是顶格单词，就是单词+空格+单词
如果末尾是单词+’\n’:+单词得到：单词+空格+单词
如果末尾是单词+标点+‘\n’+空格+ 单词：
去除标点后得到’\n’：'\n’替换为空格后->就是单词+空格+空格+单词

所以最后要去除连续空格，留下一个空格

import re
import string

def add_numbers(a,b,c):
    sum = 0
    sum +=a
    sum +=b
    sum +=c
    print("The sum is",sum)
    return sum

def remove_punctuation(input_str:string)->list:

    strs = re.sub(r'[^\w\s]', '',input_str) 
    strs = strs.replace("\n", " ") 
    strs = re.sub(r'\s+', ' ', strs)
    print(strs.split(' '))
    return strs.split(' ') 



def lowercase(input_str)->str:
    print(input_str.lower())
    return input_str.lower()

def wordcount(S:list):
    hash_tabels = {}
    for s in S:
        
        if s in hash_tabels:  
            hash_tabels[s] += 1  
        else:  
            hash_tabels[s] = 1  
    return hash_tabels


if __name__ == '__main__':
    x,y,z = 1,2,3
    result = add_numbers(1,2,3) 
    print("The result of sum is",result)

    text = """Hello world!  
              This is an example.  
              Word count is fun.  
              Is it fun to count words?  
              Yes, it is fun!"""

    text = lowercase(text)
    text = remove_punctuation(text)
    print(f'\nwordcount is:{wordcount(text)}')
最后输出
```python
hello world!  
              this is an example.  
              word count is fun.  
              is it fun to count words?  
              yes, it is fun!
['hello', 'world', 'this', 'is', 'an', 'example', 'word', 'count', 'is', 'fun', 'is', 'it', 'fun', 'to', 'count', 'words', 'yes', 'it', 'is', 'fun']

wordcount is:{'hello': 1, 'world': 1, 'this': 1, 'is': 4, 'an': 1, 'example': 1, 'word': 1, 'count': 2, 'fun': 3, 'it': 2, 'to': 1, 'words': 1, 'yes': 1}