26、Vigenère 密码破解程序详解

火锅底料102

于 2025-10-28 16:06:25 发布

阅读量20

点赞数

CC 4.0 BY-SA版权

分类专栏： Python密码学实战入门文章标签： Vigenère密码密码破解 Kasiski检查

本文链接：https://blog.youkuaiyun.com/spark7igniter/article/details/154371857

Python密码学实战入门专栏收录该内容

33 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

Vigenère 密码破解程序详解

1. 引言

Vigenère 密码是一种多表代换密码，相比单表代换密码（如 Caesar 密码）更难破解。本文将详细介绍一个用于破解 Vigenère 密码的程序，包括其原理、代码实现以及运行示例。

2. 暴力破解可能的密钥

在破解 Vigenère 密码时，若未缩小可能子密钥的列表，破解难度会很大。而且，Vigenère 密钥越长，难度差异就越明显。为了暴力破解密钥，我们会尝试所有可能子密钥的组合。例如，假设找到了 50 种可能的子密钥组合，最后一步就是用这 50 个解密密钥对完整的密文进行测试，看哪个能生成可读的英文明文。通过这种方式，就能找出如“PPQCA XQVEKG…”密文的密钥是“WICK”。

3. 程序代码及运行步骤

以下是 Vigenère 破解程序的代码：

# Vigenere Cipher Hacker
# https://www.nostarch.com/crackingcodes/ (BSD Licensed)

import itertools, re
import vigenereCipher, pyperclip, freqAnalysis, detectEnglish

LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
MAX_KEY_LENGTH = 16  # Will not attempt keys longer than this.
NUM_MOST_FREQ_LETTERS = 4  # Attempt this many letters per subkey.
SILENT_MODE = False  # If set to True, program doesn't print anything.
NONLETTERS_PATTERN = re.compile('[^A-Z]')


def main():
    # Instead of typing this ciphertext out, you can copy & paste it
    # from https://www.nostarch.com/crackingcodes/:
    ciphertext = """Adiz Avtzqeci Tmzubb wsa m Pmilqev halpqavtakuoi,
          lgouqdaf, kdmktsvmztsl, izr xoexghzr kkusitaaf. Vz wsa twbhdg
          ubalmmzhdad qz
          --snip--
          azmtmd'g widt ion bwnafz tzm Tcpsw wr Zjrva ivdcz eaigd yzmbo
          Tmzubb a kbmhptgzk dvrvwz wa efiohzd."""
    hackedMessage = hackVigenere(ciphertext)

    if hackedMessage != None:
        print('Copying hacked message to clipboard:')
        print(hackedMessage)
        pyperclip.copy(hackedMessage)
    else:
        print('Failed to hack encryption.')


def findRepeatSequencesSpacings(message):
    # Goes through the message and finds any 3- to 5-letter sequences
    # that are repeated. Returns a dict with the keys of the sequence and
    # values of a list of spacings (num of letters between the repeats).

    # Use a regular expression to remove non-letters from the message:
    message = NONLETTERS_PATTERN.sub('', message.upper())

    # Compile a list of seqLen-letter sequences found in the message:
    seqSpacings = {}  # Keys are sequences; values are lists of int spacings.
    for seqLen in range(3, 6):
        for seqStart in range(len(message) - seqLen):
            # Determine what the sequence is and store it in seq:
            seq = message[seqStart:seqStart + seqLen]

            # Look for this sequence in the rest of the message:
            for i in range(seqStart + seqLen, len(message) - seqLen):
                if message[i:i + seqLen] == seq:
                    # Found a repeated sequence:
                    if seq not in seqSpacings:
                        seqSpacings[seq] = []  # Initialize blank list.

                    # Append the spacing distance between the repeated
                    # sequence and the original sequence:
                    seqSpacings[seq].append(i - seqStart)
    return seqSpacings


def getUsefulFactors(num):
    # Returns a list of useful factors of num. By "useful" we mean factors
    # less than MAX_KEY_LENGTH + 1 and not 1. For example,
    # getUsefulFactors(144) returns [2, 3, 4, 6, 8, 9, 12, 16].

    if num < 2:
        return []  # Numbers less than 2 have no useful factors.

    factors = []  # The list of factors found.

    # When finding factors, you only need to check the integers up to
    # MAX_KEY_LENGTH:
    for i in range(2, MAX_KEY_LENGTH + 1):  # Don't test 1: it's not useful.
        if num % i == 0:
            factors.append(i)
            otherFactor = int(num / i)
            if otherFactor < MAX_KEY_LENGTH + 1 and otherFactor != 1:
                factors.append(otherFactor)
    return list(set(factors))  # Remove duplicate factors.


def getItemAtIndexOne(x):
    return x[1]


def getMostCommonFactors(seqFactors):
    # First, get a count of how many times a factor occurs in seqFactors:
    factorCounts = {}  # Key is a factor; value is how often it occurs.

    # seqFactors keys are sequences; values are lists of factors of the
    # spacings. seqFactors has a value like {'GFD': [2, 3, 4, 6, 9, 12,
    # 18, 23, 36, 46, 69, 92, 138, 207], 'ALW': [2, 3, 4, 6, ...], ...}.
    for seq in seqFactors:
        factorList = seqFactors[seq]
        for factor in factorList:
            if factor not in factorCounts:
                factorCounts[factor] = 0
            factorCounts[factor] += 1

    # Second, put the factor and its count into a tuple and make a list
    # of these tuples so we can sort them:
    factorsByCount = []
    for factor in factorCounts:
        # Exclude factors larger than MAX_KEY_LENGTH:
        if factor <= MAX_KEY_LENGTH:
            # factorsByCount is a list of tuples: (factor, factorCount).
            # factorsByCount has a value like [(3, 497), (2, 487), ...].
            factorsByCount.append((factor, factorCounts[factor]))

    # Sort the list by the factor count:
    factorsByCount.sort(key=getItemAtIndexOne, reverse=True)

    return factorsByCount


def kasiskiExamination(ciphertext):
    # Find out the sequences of 3 to 5 letters that occur multiple times
    # in the ciphertext. repeatedSeqSpacings has a value like
    # {'EXG': [192], 'NAF': [339, 972, 633], ... }:
    repeatedSeqSpacings = findRepeatSequencesSpacings(ciphertext)

    # (See getMostCommonFactors() for a description of seqFactors.)
    seqFactors = {}
    for seq in repeatedSeqSpacings:
        seqFactors[seq] = []
        for spacing in repeatedSeqSpacings[seq]:
            seqFactors[seq].extend(getUsefulFactors(spacing))

    # (See getMostCommonFactors() for a description of factorsByCount.)
    factorsByCount = getMostCommonFactors(seqFactors)

    # Now we extract the factor counts from factorsByCount and
    # put them in allLikelyKeyLengths so that they are easier to
    # use later:
    allLikelyKeyLengths = []
    for twoIntTuple in factorsByCount:
        allLikelyKeyLengths.append(twoIntTuple[0])

    return allLikelyKeyLengths


def getNthSubkeysLetters(nth, keyLength, message):
    # Returns every nth letter for each keyLength set of letters in text.
    # E.g. getNthSubkeysLetters(1, 3, 'ABCABCABC') returns 'AAA'
    #      getNthSubkeysLetters(2, 3, 'ABCABCABC') returns 'BBB'
    #      getNthSubkeysLetters(3, 3, 'ABCABCABC') returns 'CCC'
    #      getNthSubkeysLetters(1, 5, 'ABCDEFGHI') returns 'AF'

    # Use a regular expression to remove non-letters from the message:
    message = NONLETTERS_PATTERN.sub('', message)

    i = nth - 1
    letters = []
    while i < len(message):
        letters.append(message[i])
        i += keyLength
    return ''.join(letters)


def attemptHackWithKeyLength(ciphertext, mostLikelyKeyLength):
    # Determine the most likely letters for each letter in the key:
    ciphertextUp = ciphertext.upper()
    # allFreqScores is a list of mostLikelyKeyLength number of lists.
    # These inner lists are the freqScores lists:
    allFreqScores = []
    for nth in range(1, mostLikelyKeyLength + 1):
        nthLetters = getNthSubkeysLetters(nth, mostLikelyKeyLength,
                                          ciphertextUp)

        # freqScores is a list of tuples like
        # [(<letter>, <Eng. Freq. match score>), ... ]
        # List is sorted by match score. Higher score means better match.
        # See the englishFreqMatchScore() comments in freqAnalysis.py.
        freqScores = []
        for possibleKey in LETTERS:
            decryptedText = vigenereCipher.decryptMessage(possibleKey,
                                                          nthLetters)
            keyAndFreqMatchTuple = (possibleKey,
                                    freqAnalysis.englishFreqMatchScore(decryptedText))
            freqScores.append(keyAndFreqMatchTuple)
        # Sort by match score:
        freqScores.sort(key=getItemAtIndexOne, reverse=True)

        allFreqScores.append(freqScores[:NUM_MOST_FREQ_LETTERS])

    if not SILENT_MODE:
        for i in range(len(allFreqScores)):
            # Use i + 1 so the first letter is not called the "0th" letter:
            print('Possible letters for letter %s of the key: ' % (i + 1),
                  end='')
            for freqScore in allFreqScores[i]:
                print('%s ' % freqScore[0], end='')
            print()  # Print a newline.

    # Try every combination of the most likely letters for each position
    # in the key:
    for indexes in itertools.product(range(NUM_MOST_FREQ_LETTERS),
                                     repeat=mostLikelyKeyLength):
        # Create a possible key from the letters in allFreqScores:
        possibleKey = ''
        for i in range(mostLikelyKeyLength):
            possibleKey += allFreqScores[i][indexes[i]][0]

        if not SILENT_MODE:
            print('Attempting with key: %s' % (possibleKey))

        decryptedText = vigenereCipher.decryptMessage(possibleKey,
                                                      ciphertextUp)

        if detectEnglish.isEnglish(decryptedText):
            # Set the hacked ciphertext to the original casing:
            origCase = []
            for i in range(len(ciphertext)):
                if ciphertext[i].isupper():
                    origCase.append(decryptedText[i].upper())
                else:
                    origCase.append(decryptedText[i].lower())
            decryptedText = ''.join(origCase)

            # Check with user to see if the key has been found:
            print('Possible encryption hack with key %s:' % (possibleKey))
            print(decryptedText[:200])  # Only show first 200 characters.
            print()
            print('Enter D if done, anything else to continue hacking:')
            response = input('> ')

            if response.strip().upper().startswith('D'):
                return decryptedText

    # No English-looking decryption found, so return None:
    return None


def hackVigenere(ciphertext):
    # First, we need to do Kasiski examination to figure out what the
    # length of the ciphertext's encryption key is:
    allLikelyKeyLengths = kasiskiExamination(ciphertext)
    if not SILENT_MODE:
        keyLengthStr = ''
        for keyLength in allLikelyKeyLengths:
            keyLengthStr += '%s ' % (keyLength)
        print('Kasiski examination results say the most likely key lengths are: ' + keyLengthStr + '\n')
    hackedMessage = None
    for keyLength in allLikelyKeyLengths:
        if not SILENT_MODE:
            print('Attempting hack with key length %s (%s possible keys)...'
                  % (keyLength, NUM_MOST_FREQ_LETTERS ** keyLength))
        hackedMessage = attemptHackWithKeyLength(ciphertext, keyLength)
        if hackedMessage != None:
            break

    # If none of the key lengths found using Kasiski examination
    # worked, start brute-forcing through key lengths:
    if hackedMessage == None:
        if not SILENT_MODE:
            print('Unable to hack message with likely key length(s). Brute-forcing key length...')
        for keyLength in range(1, MAX_KEY_LENGTH + 1):
            # Don't recheck key lengths already tried from Kasiski:
            if keyLength not in allLikelyKeyLengths:
                if not SILENT_MODE:
                    print('Attempting hack with key length %s (%s possible keys)...'
                          % (keyLength, NUM_MOST_FREQ_LETTERS ** keyLength))
                hackedMessage = attemptHackWithKeyLength(ciphertext,
                                                         keyLength)
                if hackedMessage != None:
                    break
    return hackedMessage


# If vigenereHacker.py is run (instead of imported as a module), call
# the main() function:
if __name__ == '__main__':
    main()

运行步骤如下：
1. 打开一个新的文件编辑器窗口，选择“文件” -> “新建文件”。
2. 确保 detectEnglish.py 、 freqAnalysis.py 、 vigenereCipher.py 和 pyperclip.py 文件与 vigenereHacker.py 文件在同一目录下。
3. 将上述代码输入到文件编辑器中，并保存为 vigenereHacker.py 。
4. 按 F5 运行程序。
5. 程序第 17 行的密文很难从书中复制，为避免输入错误，可从 https://www.nostarch.com/crackingcodes/ 复制粘贴。还可以使用该网站上的在线差异工具检查程序中的文本与代码的差异。

4. 程序主要函数及流程

findRepeatSequencesSpacings(message) ：该函数用于找出消息中所有 3 到 5 个字母的重复序列，并计算这些重复序列之间的间距。具体步骤如下：
1. 使用正则表达式移除消息中的非字母字符，并将消息转换为大写。
2. 使用两层嵌套的 for 循环遍历所有可能的 3 到 5 个字母的序列。
3. 对于每个序列，在消息的剩余部分查找重复的序列，并计算间距。
4. 将重复序列及其间距存储在 seqSpacings 字典中并返回。

以下是该函数的流程图：

graph TD;
    A[开始] --> B[移除非字母字符并转换为大写];
    B --> C[遍历 3 到 5 个字母的序列];
    C --> D[确定当前序列 seq];
    D --> E[在剩余消息中查找重复序列];
    E --> F{找到重复序列?};
    F -- 是 --> G[计算间距并添加到 seqSpacings];
    F -- 否 --> C;
    G --> C;
    C --> H[返回 seqSpacings];
    H --> I[结束];

getUsefulFactors(num) ：该函数用于计算 num 的有用因子，即小于 MAX_KEY_LENGTH + 1 且不为 1 的因子。具体步骤如下：
1. 检查 num 是否小于 2，如果是，则返回空列表。
2. 创建一个空列表 factors 用于存储因子。
3. 遍历从 2 到 MAX_KEY_LENGTH 的整数，检查是否为 num 的因子。
4. 如果是因子，则将其添加到 factors 列表中，并检查 num / i 是否也是有用因子。
5. 使用 set() 函数移除重复因子，并返回结果。

以下是该函数的表格说明：
| 步骤 | 操作 |
| ---- | ---- |
| 1 | 检查 num < 2 ，若是则返回 [] |
| 2 | 创建空列表 factors |
| 3 | 遍历 i 从 2 到 MAX_KEY_LENGTH + 1 |
| 4 | 若 num % i == 0 ，则将 i 添加到 factors |
| 5 | 计算 otherFactor = int(num / i) |
| 6 | 若 otherFactor 满足条件，则添加到 factors |
| 7 | 使用 set() 移除重复因子并返回 |

getMostCommonFactors(seqFactors) ：该函数用于找出最常见的因子，即最可能的密钥长度。具体步骤如下：
1. 创建一个字典 factorCounts 用于存储每个因子的出现次数。
2. 遍历 seqFactors 中的每个序列，统计每个因子的出现次数。
3. 将因子及其出现次数转换为元组，并存储在 factorsByCount 列表中。
4. 对 factorsByCount 列表按因子出现次数进行排序。
5. 返回排序后的列表。

5. 示例运行结果

当运行 vigenereHacker.py 程序时，输出可能如下：

Kasiski examination results say the most likely key lengths are: 3 2 6 4 12
Attempting hack with key length 3 (27 possible keys)...
Possible letters for letter 1 of the key: A L M
Possible letters for letter 2 of the key: S N O
Possible letters for letter 3 of the key: V I Z
Attempting with key: ASV
Attempting with key: ASI
--snip--
Attempting with key: MOI
Attempting with key: MOZ
Attempting hack with key length 2 (9 possible keys)...
Possible letters for letter 1 of the key: O A E
Possible letters for letter 2 of the key: M S I
Attempting with key: OM
Attempting with key: OS
--snip--
Attempting with key: ES
Attempting with key: EI
Attempting hack with key length 6 (729 possible keys)...
Possible letters for letter 1 of the key: A E O
Possible letters for letter 2 of the key: S D G
Possible letters for letter 3 of the key: I V X
Possible letters for letter 4 of the key: M Z Q
Possible letters for letter 5 of the key: O B Z
Possible letters for letter 6 of the key: V I K
Attempting with key: ASIMOV
Possible encryption hack with key ASIMOV:
ALAN MATHISON TURING WAS A BRITISH MATHEMATICIAN, LOGICIAN, CRYPTANALYST, AND
COMPUTER SCIENTIST. HE WAS HIGHLY INFLUENTIAL IN THE DEVELOPMENT OF COMPUTER
SCIENCE, PROVIDING A FORMALISATION OF THE CON
Enter D for done, or just press Enter to continue hacking:
> d
Copying hacked message to clipboard:
Alan Mathison Turing was a British mathematician, logician, cryptanalyst, and
computer scientist. He was highly influential in the development of computer
--snip--

通过以上步骤和代码，我们可以使用该程序尝试破解 Vigenère 密码。程序通过 Kasiski 检查和暴力破解的方法，逐步缩小可能的密钥长度范围，最终找到合适的密钥并解密出明文。

6. 深入分析主要函数

6.1 `kasiskiExamination(ciphertext)`

此函数综合运用前面的函数，通过 Kasiski 检查来找出最可能的密钥长度。具体步骤如下：
1. 调用 findRepeatSequencesSpacings(ciphertext) 函数，找出密文中 3 到 5 个字母的重复序列及其间距，存储在 repeatedSeqSpacings 中。
2. 遍历 repeatedSeqSpacings 中的每个重复序列，对每个间距调用 getUsefulFactors 函数计算其有用因子，将结果存储在 seqFactors 中。
3. 调用 getMostCommonFactors(seqFactors) 函数，统计每个因子的出现次数并排序，得到 factorsByCount 。
4. 从 factorsByCount 中提取因子，存储在 allLikelyKeyLengths 列表中并返回。

以下是该函数的操作步骤列表：
1. 调用 findRepeatSequencesSpacings(ciphertext) 得到 repeatedSeqSpacings 。
2. 初始化 seqFactors 字典。
3. 遍历 repeatedSeqSpacings 中的每个序列：
- 初始化该序列对应的因子列表。
- 遍历该序列的每个间距，调用 getUsefulFactors 计算有用因子并添加到列表中。
4. 调用 getMostCommonFactors(seqFactors) 得到 factorsByCount 。
5. 从 factorsByCount 中提取因子到 allLikelyKeyLengths 并返回。

以下是该函数的流程图：

graph TD;
    A[开始] --> B[调用 findRepeatSequencesSpacings];
    B --> C[初始化 seqFactors];
    C --> D[遍历 repeatedSeqSpacings];
    D --> E[初始化当前序列的因子列表];
    E --> F[遍历当前序列的间距];
    F --> G[调用 getUsefulFactors 计算因子];
    G --> E;
    D --> H[调用 getMostCommonFactors];
    H --> I[提取因子到 allLikelyKeyLengths];
    I --> J[返回 allLikelyKeyLengths];
    J --> K[结束];

6.2 `getNthSubkeysLetters(nth, keyLength, message)`

该函数用于提取消息中每隔 keyLength 个字母的第 nth 个字母组成的字符串。具体步骤如下：
1. 使用正则表达式移除消息中的非字母字符。
2. 初始化索引 i 为 nth - 1 。
3. 循环遍历消息，每次将索引 i 处的字母添加到 letters 列表中，并将 i 增加 keyLength 。
4. 将 letters 列表转换为字符串并返回。

以下是该函数的表格说明：
| 步骤 | 操作 |
| ---- | ---- |
| 1 | 移除非字母字符 |
| 2 | 初始化 i = nth - 1 |
| 3 | 循环：将 i 处字母添加到 letters ， i += keyLength |
| 4 | 返回 letters 转换的字符串 |

6.3 `attemptHackWithKeyLength(ciphertext, mostLikelyKeyLength)`

该函数尝试使用指定的密钥长度进行破解。具体步骤如下：
1. 将密文转换为大写。
2. 对于密钥的每个位置，调用 getNthSubkeysLetters 提取相应的字母，对每个可能的密钥字母进行解密，并计算解密文本的英语频率匹配分数，存储在 allFreqScores 中。
3. 打印每个位置可能的密钥字母。
4. 尝试所有可能的密钥组合，对每个组合进行解密。
5. 如果解密文本看起来像英语，提示用户确认是否找到密钥，若用户输入 D 则返回解密文本，否则继续尝试。
6. 若未找到合适的密钥，返回 None 。

以下是该函数的操作步骤列表：
1. 密文转大写。
2. 遍历密钥位置：
- 提取相应字母。
- 对每个可能密钥字母解密并计算分数。
- 存储分数到 allFreqScores 。
3. 打印可能的密钥字母。
4. 遍历所有可能的密钥组合：
- 生成可能的密钥。
- 解密并检查是否为英语。
- 若为英语，提示用户确认。
- 若用户输入 D ，返回解密文本。
5. 若未找到，返回 None 。

7. `hackVigenere(ciphertext)` 函数总结

该函数是整个破解程序的核心，它结合了 Kasiski 检查和暴力破解的方法。具体步骤如下：
1. 调用 kasiskiExamination(ciphertext) 函数，得到最可能的密钥长度列表 allLikelyKeyLengths 。
2. 打印 Kasiski 检查结果。
3. 遍历 allLikelyKeyLengths ，对每个可能的密钥长度调用 attemptHackWithKeyLength 函数进行尝试破解。
4. 若找到合适的密钥，停止尝试并返回解密文本。
5. 若 Kasiski 检查得到的密钥长度都失败，从 1 到 MAX_KEY_LENGTH 进行暴力破解，跳过已经尝试过的长度。

以下是该函数的流程图：

graph TD;
    A[开始] --> B[调用 kasiskiExamination];
    B --> C[打印检查结果];
    C --> D[遍历 allLikelyKeyLengths];
    D --> E[调用 attemptHackWithKeyLength];
    E --> F{找到密钥?};
    F -- 是 --> G[返回解密文本];
    F -- 否 --> D;
    D --> H{所有可能长度尝试完?};
    H -- 是 --> I[暴力破解未尝试的长度];
    I --> E;
    G --> J[结束];

8. 总结

通过上述的一系列函数和操作步骤，我们可以使用该程序对 Vigenère 密码进行破解。程序主要采用了 Kasiski 检查和暴力破解相结合的方法：
- Kasiski 检查通过找出密文中的重复序列及其间距，计算间距的因子，找出最可能的密钥长度。
- 暴力破解则在可能的密钥长度下，尝试所有可能的密钥组合，通过英语频率匹配分数和用户确认来找到合适的密钥。

在实际使用中，我们可以按照以下步骤操作：
1. 准备好密文，确保相关的辅助文件（如 detectEnglish.py 、 freqAnalysis.py 等）与 vigenereHacker.py 在同一目录。
2. 运行 vigenereHacker.py 程序。
3. 根据程序提示，若找到可能的解密结果，可选择确认或继续尝试。

通过这种方式，我们可以逐步缩小可能的密钥范围，最终成功破解 Vigenère 密码。