24、字母频率分析与维吉尼亚密码破解

最新推荐文章于 2025-10-29 12:06:31 发布

火锅底料102

最新推荐文章于 2025-10-29 12:06:31 发布

阅读量24

点赞数

CC 4.0 BY-SA版权

分类专栏： Python密码学实战入门文章标签：维吉尼亚密码频率分析字母频率

本文链接：https://blog.youkuaiyun.com/spark7igniter/article/details/154371853

Python密码学实战入门专栏收录该内容

33 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

字母频率分析与维吉尼亚密码破解

在密码破解的领域中，频率分析是一种非常重要的技术。本文将详细介绍如何通过频率分析来破解维吉尼亚密码，同时会深入讲解相关的Python代码实现。

维吉尼亚密码破解思路

我们采用的方法是对使用一个子密钥加密的字母进行解密，并执行频率分析，以此来确定哪个解密后的密文的字母频率与常规英语的字母频率最为匹配。简单来说，就是要找出哪个解密结果的频率匹配得分最高，这很可能意味着我们找到了正确的子密钥。

假设密钥长度为5个字母（后续会介绍如何确定密钥长度），由于维吉尼亚密码中每个子密钥有26种可能（字母表中的字母总数），所以计算机只需进行26 + 26 + 26 + 26 + 26 = 156次解密，这比尝试所有可能的子密钥组合（26 × 26 × 26 × 26 × 26 = 11,881,376次解密）要容易得多。

频率分析模块的函数

为了实现频率分析，我们需要编写一个模块，该模块包含以下几个有用的函数：
- getLetterCount() ：接受一个字符串参数，返回一个字典，该字典记录了每个字母在字符串中出现的次数。
- getFrequencyOrder() ：接受一个字符串参数，返回一个由26个字母组成的字符串，这些字母按照在字符串中出现的频率从高到低排列。
- englishFreqMatchScore() ：接受一个字符串参数，返回一个0到12之间的整数，表示该字符串的字母频率与英语字母频率的匹配得分。

代码实现

以下是实现这些功能的Python代码：

# Frequency Finder
# https://www.nostarch.com/crackingcodes/ (BSD Licensed)

ETAOIN = 'ETAOINSHRDLCUMWFGYPBVKJXQZ'
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def getLetterCount(message):
    # Returns a dictionary with keys of single letters and values of the
    # count of how many times they appear in the message parameter:
    letterCount = {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0,
                   'G': 0, 'H': 0, 'I': 0, 'J': 0, 'K': 0, 'L': 0, 'M': 0, 'N': 0,
                   'O': 0, 'P': 0, 'Q': 0, 'R': 0, 'S': 0, 'T': 0, 'U': 0, 'V': 0,
                   'W': 0, 'X': 0, 'Y': 0, 'Z': 0}

    for letter in message.upper():
        if letter in LETTERS:
            letterCount[letter] += 1

    return letterCount

def getItemAtIndexZero(items):
    return items[0]

def getFrequencyOrder(message):
    # Returns a string of the alphabet letters arranged in order of most
    # frequently occurring in the message parameter.

    # First, get a dictionary of each letter and its frequency count:
    letterToFreq = getLetterCount(message)

    # Second, make a dictionary of each frequency count to the letter(s)
    # with that frequency:
    freqToLetter = {}
    for letter in LETTERS:
        if letterToFreq[letter] not in freqToLetter:
            freqToLetter[letterToFreq[letter]] = [letter]
        else:
            freqToLetter[letterToFreq[letter]].append(letter)

    # Third, put each list of letters in reverse "ETAOIN" order, and then
    # convert it to a string:
    for freq in freqToLetter:
        freqToLetter[freq].sort(key=ETAOIN.find, reverse=True)
        freqToLetter[freq] = ''.join(freqToLetter[freq])

    # Fourth, convert the freqToLetter dictionary to a list of
    # tuple pairs (key, value), and then sort them:
    freqPairs = list(freqToLetter.items())
    freqPairs.sort(key=getItemAtIndexZero, reverse=True)

    # Fifth, now that the letters are ordered by frequency, extract all
    # the letters for the final string:
    freqOrder = []
    for freqPair in freqPairs:
        freqOrder.append(freqPair[1])

    return ''.join(freqOrder)

def englishFreqMatchScore(message):
    # Return the number of matches that the string in the message
    # parameter has when its letter frequency is compared to English
    # letter frequency. A "match" is how many of its six most frequent
    # and six least frequent letters are among the six most frequent and
    # six least frequent letters for English.
    freqOrder = getFrequencyOrder(message)

    matchScore = 0
    # Find how many matches for the six most common letters there are:
    for commonLetter in ETAOIN[:6]:
        if commonLetter in freqOrder[:6]:
            matchScore += 1
    # Find how many matches for the six least common letters there are:
    for uncommonLetter in ETAOIN[-6:]:
        if uncommonLetter in freqOrder[-6:]:
            matchScore += 1

    return matchScore

代码解释

存储字母频率顺序 ：
- ETAOIN 变量存储了字母表中26个字母按出现频率从高到低的顺序： ETAOINSHRDLCUMWFGYPBVKJXQZ 。虽然并非所有英语文本都严格遵循这个频率顺序，但在大多数情况下，这个顺序已经足够准确。
- LETTERS 变量存储了所有大写字母的字符串： ABCDEFGHIJKLMNOPQRSTUVWXYZ ，用于提供字符串字母和整数索引之间的映射。
统计消息中的字母 ：
- getLetterCount() 函数接受一个消息字符串，返回一个字典，字典的键是单个大写字母字符串，值是该字母在消息中出现的次数。
- 示例代码如下：

message = """Alan Mathison Turing was a British mathematician, logician, cryptanalyst, and
computer
scientist. He was highly influential in the development of computer science,
providing a
formalisation of the concepts of "algorithm" and "computation" with the Turing
machine. Turing
is widely considered to be the father of computer science and artificial
intelligence. During
World War II, Turing worked for the Government Code and Cypher School (GCCS) at
Bletchley Park,
Britain's codebreaking centre. For a time he was head of Hut 8, the section
responsible for
German naval cryptanalysis. He devised a number of techniques for breaking German
ciphers,
including the method of the bombe, an electromechanical machine that could find
settings
for the Enigma machine. After the war he worked at the National Physical
Laboratory, where
he created one of the first designs for a stored-program computer, the ACE. In
1948 Turing
joined Max Newman's Computing Laboratory at Manchester University, where he
assisted in the
development of the Manchester computers and became interested in mathematical
biology. He wrote
a paper on the chemical basis of morphogenesis, and predicted oscillating
chemical reactions
such as the Belousov-Zhabotinsky reaction, which were first observed in the
1960s. Turing's
homosexuality resulted in a criminal prosecution in 1952, when homosexual acts
were still
illegal in the United Kingdom. He accepted treatment with female hormones
(chemical castration)
as an alternative to prison. Turing died in 1954, just over two weeks before his
42nd birthday,
from cyanide poisoning. An inquest determined that his death was suicide; his
mother and some
others believed his death was accidental. On 10 September 2009, following an
Internet campaign,
British Prime Minister Gordon Brown made an official public apology on behalf of
the British
government for "the appalling way he was treated." As of May 2012 a private
member's bill was
before the House of Lords which would grant Turing a statutory pardon if
enacted."""

letter_count = getLetterCount(message)
print(letter_count)

- 输出结果如下：

{'A': 135, 'B': 30, 'C': 74, 'D': 58, 'E': 196, 'F': 37, 'G': 39, 'H': 87,
'I': 139, 'J': 2, 'K': 8, 'L': 62, 'M': 58, 'N': 122, 'O': 113, 'P': 36,
'Q': 2, 'R': 106, 'S': 89, 'T': 140, 'U': 37, 'V': 14, 'W': 30, 'X': 3,
'Y': 21, 'Z': 1}

获取元组的第一个元素 ：
- getItemAtIndexZero() 函数接受一个元组作为参数，返回该元组的第一个元素。这个函数将在后续的排序操作中使用。
按频率对消息中的字母进行排序 ：
- getFrequencyOrder() 函数接受一个消息字符串作为参数，返回一个由26个大写字母组成的字符串，这些字母按照在消息中出现的频率从高到低排列。
- 该函数的实现步骤如下：
  1. 统计字母频率 ：调用 getLetterCount() 函数获取每个字母的频率计数。
  2. 创建频率到字母的字典 ：将频率计数作为键，具有该频率的字母列表作为值。
  3. 按逆ETAOIN顺序对字母列表进行排序 ：确保相同频率的字母按逆ETAOIN顺序排列，以减少频率匹配得分的偶然性。
  4. 将字典转换为元组列表并排序 ：将字典转换为元组列表，并按频率计数从高到低排序。
  5. 提取最终的字母字符串 ：将排序后的元组列表中的字母提取出来，组成最终的字符串。
- 示例代码如下：

frequency_order = getFrequencyOrder(message)
print(frequency_order)

- 输出结果如下：

'ETIANORSHCLMDGFUPBWYVKXQJZ'

流程图

graph TD;
    A[开始] --> B[统计字母频率];
    B --> C[创建频率到字母的字典];
    C --> D[按逆ETAOIN顺序对字母列表进行排序];
    D --> E[将字典转换为元组列表并排序];
    E --> F[提取最终的字母字符串];
    F --> G[结束];

总结

通过以上步骤，我们实现了对消息中字母的频率分析，并按频率对字母进行了排序。这些功能将在后续的维吉尼亚密码破解中发挥重要作用。下一部分将继续介绍如何计算字母频率与英语字母频率的匹配得分，以及如何利用这些得分来破解维吉尼亚密码。

字母频率分析与维吉尼亚密码破解

计算字母频率匹配得分

englishFreqMatchScore() 函数用于计算消息字符串的字母频率与英语字母频率的匹配得分。具体来说，它会比较消息中六个最常见和六个最不常见的字母与英语中相应字母的匹配情况。

以下是该函数的详细解释：

def englishFreqMatchScore(message):
    # Return the number of matches that the string in the message
    # parameter has when its letter frequency is compared to English
    # letter frequency. A "match" is how many of its six most frequent
    # and six least frequent letters are among the six most frequent and
    # six least frequent letters for English.
    freqOrder = getFrequencyOrder(message)

    matchScore = 0
    # Find how many matches for the six most common letters there are:
    for commonLetter in ETAOIN[:6]:
        if commonLetter in freqOrder[:6]:
            matchScore += 1
    # Find how many matches for the six least common letters there are:
    for uncommonLetter in ETAOIN[-6:]:
        if uncommonLetter in freqOrder[-6:]:
            matchScore += 1

    return matchScore

首先，调用 getFrequencyOrder() 函数获取消息中字母按频率排序的字符串 freqOrder 。
然后，初始化匹配得分 matchScore 为 0。
接着，遍历 ETAOIN 中前六个最常见的字母，检查它们是否在 freqOrder 的前六个字母中，如果是，则匹配得分加 1。
最后，遍历 ETAOIN 中后六个最不常见的字母，检查它们是否在 freqOrder 的后六个字母中，如果是，则匹配得分加 1。

示例代码如下：

match_score = englishFreqMatchScore(message)
print(match_score)

这个得分可以帮助我们判断解密后的文本是否接近正常的英语文本，得分越高，说明越有可能是正确的解密结果。

函数调用关系总结

为了更清晰地理解各个函数之间的调用关系，我们可以用一个表格来总结：
| 函数名 | 功能 | 调用的函数 |
| ---- | ---- | ---- |
| getLetterCount() | 统计消息中每个字母的出现次数 | 无 |
| getItemAtIndexZero() | 获取元组的第一个元素 | 无 |
| getFrequencyOrder() | 按频率对消息中的字母进行排序 | getLetterCount() 、 getItemAtIndexZero() |
| englishFreqMatchScore() | 计算字母频率与英语字母频率的匹配得分 | getFrequencyOrder() |

维吉尼亚密码破解流程

结合前面介绍的频率分析方法，我们可以总结出维吉尼亚密码的破解流程：
1. 猜测密钥长度 ：这里我们先假设密钥长度为 5 个字母。
2. 对每个子密钥进行解密和频率分析 ：
- 对使用一个子密钥加密的字母进行解密。
- 执行频率分析，确定哪个解密后的密文的字母频率与常规英语的字母频率最为匹配，即找出频率匹配得分最高的解密结果，该结果对应的子密钥可能是正确的。
3. 重复步骤 2 ：对第二个、第三个、第四个和第五个子密钥重复上述过程。
4. 确定最终密钥 ：根据每个子密钥的频率匹配得分，确定最终的密钥。

流程图

graph TD;
    A[猜测密钥长度] --> B[对第一个子密钥解密并分析];
    B --> C{频率匹配得分最高?};
    C -- 是 --> D[记录子密钥];
    C -- 否 --> B;
    D --> E[对下一个子密钥解密并分析];
    E --> C;
    E -- 无更多子密钥 --> F[确定最终密钥];
    F --> G[结束];

总结

通过字母频率分析，我们可以有效地破解维吉尼亚密码。首先，我们实现了几个关键的函数，包括统计字母频率、按频率排序字母和计算频率匹配得分。然后，我们介绍了维吉尼亚密码的破解流程，通过对每个子密钥进行解密和频率分析，逐步确定最终的密钥。这种方法大大减少了破解所需的计算量，提高了破解效率。在实际应用中，我们可以根据具体情况调整密钥长度的猜测方法，进一步优化破解过程。