Character frequency

最新推荐文章于 2022-07-21 07:04:06 发布

kaikaijia

最新推荐文章于 2022-07-21 07:04:06 发布

阅读量582

点赞数

CC 4.0 BY-SA版权

分类专栏： codewars

本文链接：https://blog.youkuaiyun.com/kaikaijia/article/details/39933981

codewars 专栏收录该内容

5 篇文章

订阅专栏

本博客介绍了一个Python函数，用于统计给定文本中字母的频率，并排除数字、空格和标点符号。该函数将字母转换为小写并进行排序，适用于简单替换密码的频率分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

地址：http://www.codewars.com/kata/53e895e28f9e66a56900011a/train/python

Write a function that takes a piece of text in the form of a string and returns the letter frequency count for the text. This count excludes numbers, spaces and all punctuation marks. Upper and lower case versions of a character are equivalent and the result should all be in lowercase.

The function should return a list of tuples sorted by the most frequent letters first. Letters with the same frequency are ordered alphabetically.
For example:
letter_frequency('aaAabb dddDD hhcc')
will return
[('d',5), ('a',4), ('b',2), ('c',2), ('h',2)]

Letter frequency analysis is often used to analyse simple substitution cipher texts like those created by the Caesar cipher.

代码，注释比较详细：

def letter_frequency(text):
  ans = []
  dic = {}
  #长度计算放在循环里效率低
  lenOfText = len(text)
  
  for i in range(0,lenOfText):
      #提前处理成小写
      alp = text.lower()[i]

      #非字母不统计
      if alp.isalpha() == False:
          continue
        
      #用字典统计字母个数    
      if dic.has_key(alp):
          dic[alp] += 1
      else:
          dic[alp] = 1
  
  #反转字典元素存入list
  for k,v in dic.items():
      ans.append((v,k))
  #按出现频率由高到底排序
  ans.sort(reverse=True)

  #频次相同，按字母序
  lenOfAns = len(ans)
  for i in range(0,lenOfAns-1):
	for j in range(i+1,lenOfAns):
		if ans[i][:1] == ans[j][:1] and ans[i][-1:] > ans[j][-1:]:
			tmp = ans[i]
			ans[i] = ans[j]
			ans[j] = tmp
  #交换字母和频次位置        
  nans = []
  for i in range(0,lenOfAns):
      nans.append((ans[i][1],ans[i][0]))

 return nans