地址:http://www.codewars.com/kata/53e895e28f9e66a56900011a/train/python
Write a function that takes a piece of text in the form of a string and returns the letter frequency count for the text. This count excludes numbers, spaces and all punctuation marks. Upper and lower case versions of a character are equivalent and the result
should all be in lowercase.
The function should return a list of tuples sorted by the most frequent letters first. Letters with the same frequency are ordered alphabetically.
For example:
letter_frequency('aaAabb dddDD hhcc')
will return
[('d',5), ('a',4), ('b',2), ('c',2), ('h',2)]
Letter frequency analysis is often used to analyse simple substitution cipher texts like those created by the Caesar cipher.
代码,注释比较详细:
def letter_frequency(text):
ans = []
dic = {}
#长度计算放在循环里效率低
lenOfText = len(text)
for i in range(0,lenOfText):
#提前处理成小写
alp = text.lower()[i]
#非字母不统计
if alp.isalpha() == False:
continue
#用字典统计字母个数
if dic.has_key(alp):
dic[alp] += 1
else:
dic[alp] = 1
#反转字典元素存入list
for k,v in dic.items():
ans.append((v,k))
#按出现频率由高到底排序
ans.sort(reverse=True)
#频次相同,按字母序
lenOfAns = len(ans)
for i in range(0,lenOfAns-1):
for j in range(i+1,lenOfAns):
if ans[i][:1] == ans[j][:1] and ans[i][-1:] > ans[j][-1:]:
tmp = ans[i]
ans[i] = ans[j]
ans[j] = tmp
#交换字母和频次位置
nans = []
for i in range(0,lenOfAns):
nans.append((ans[i][1],ans[i][0]))
return nans