33、数据加密备份与高效上传技术详解

sql99

于 2025-10-12 14:38:53 发布

阅读量48

点赞数

CC 4.0 BY-SA版权

分类专栏：解密Windows Azure编程文章标签：数据加密 AES-256 RSA加密

本文链接：https://blog.youkuaiyun.com/sql99/article/details/153177041

解密Windows Azure编程专栏收录该内容

34 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

数据加密备份与高效上传技术详解

在数据安全和备份的领域中，我们常常需要对数据进行加密处理，以确保数据在传输和存储过程中的安全性。同时，为了提高备份效率，我们还需要采用一些高效的上传策略。本文将详细介绍数据加密、解密、签名验证以及高效上传的相关技术和代码实现。

1. 数据加密流程

azbackup 使用以下三步流程对数据（即上一步生成的压缩存档）进行加密：
1. 生成唯一的 Ksym 密钥 ：为每个存档生成一个唯一的 256 位密钥，称为 Ksym。
2. 使用 AES - 256 在 CBC 模式下加密存档 ：使用 Ksym 对存档进行加密。
3. 使用用户的 RSA 加密密钥（Kenc）加密 Ksym ：并将其附加到上一步的加密数据中。

以下是加密数据的代码示例：

def generate_rand_bits(bits=32*8):
    """SystemRandom is a cryptographically strong source of randomness
     Get n bits of randomness"""
    import random
    sys_random = random.SystemRandom()
    return long_as_bytes(sys_random.getrandbits(bits), bits/8)

def long_as_bytes(lvalue, width):
    """This rather dense piece of code takes a long and splits it apart into a
    byte array containing its constituent bytes with least significant byte
    first"""
    fmt = '%%.%dx' % (2*width)
    return unhexlify(fmt % (lvalue & ((1L<<8*width)-1)))

def block_encrypt(data, key):
    """ High level function which takes data and key as parameters
        and turns it into
        IV + CipherText after padding. Note that this still needs a sig added
        At the end"""
    iv = generate_rand_bits(32 * 8)
    ciphertext = aes256_encrypt_data(data, key, iv)
    return iv + ciphertext

def aes256_encrypt_data(data, key, iv):
    """ Takes data, a 256-bit key and a IV and
    encrypts it. Encryption is done
    with AES 256 in CBC mode. Note that OpenSSL is doing
    the padding for us"""
    enc =1
    cipher = EVP.Cipher('aes_256_cbc', key,iv , enc,0)
    pbuf = cStringIO.StringIO(data)
    cbuf = cStringIO.StringIO()
    ciphertext = aes256_stream_helper(cipher, pbuf, cbuf)
    pbuf.close()
    cbuf.close()
    return ciphertext

def aes256_stream_helper(cipher, input_stream, output_stream):
    while True:
        buf = input_stream.read()
        if not buf:
            break
        output_stream.write(cipher.update(buf))
    output_stream.write(cipher.final())
    return output_stream.getvalue()

def encrypt_rsa(rsa_key, data):
    return rsa_key.public_encrypt(data, RSA.pkcs1_padding)

1.1 生成唯一的 Ksym

生成随机唯一密钥的工作由 generate_rand_bits 函数完成。该函数以要生成的位数为参数，这里使用 256 位，因为我们使用的是 AES - 256。通过调用 Python 的 random.SystemRandom 来获取具有强加密性的随机数。

需要注意的是，使用 random.SystemRandom 而不是 Python 内置的随机数生成器非常重要。因为强加密性随机数生成器具有许多重要的安全特性，难以被预测。使用内置随机数生成器会导致安全漏洞，因为攻击者可以预测密钥并解密数据。

在不同操作系统中，Python 依赖操作系统来生成随机数。在 Unix 系统中，会调用 /dev/urandom ；在 Windows 系统中，会调用 CryptGenRandom 。

1.2 使用 AES - 256 加密

AES 是一种块密码，它将大小为 n（这里是 256 位）的块和长度为 n 的密钥转换为长度为 n 的密文。由于数据通常比 256 位长，因此需要使用操作模式来处理。这里选择的模式是密码块链接（CBC）。

CBC 模式将输入数据（明文）分割成块大小的块，每个明文块在加密之前与前一个密文块进行异或运算。与电子密码本（ECB）模式不同，CBC 模式可以防止攻击者通过查找重复形式来获取数据信息，因为每个块的加密形式还依赖于前面的块。

为了避免数据开头的模式被检测到，块密码使用初始化向量（IV）。IV 是一个填充有随机数据的块，用作“起始块”。在示例代码中，通过调用 generate_rand_bits 来生成 IV。

加密工作的核心在 aes256_encrypt_data 函数中完成，它创建一个 EVP.Cipher 类的实例，并指定使用 AES - 256 在 CBC 模式下进行加密。 aes256_stream_helper 函数负责将数据写入密码对象，并将密文读取到输出流中。

最后， block_encrypt 函数调用生成 IV、加密输入数据，并返回加密数据和 IV 的连接版本。

1.3 使用 Kenc 加密 Ksym

最后一步是使用 Kenc 对 Ksym 进行加密。由于这是一个 RSA 密钥对，因此使用公钥部分进行加密。RSA 对输入数据的大小和结构敏感，因此使用 OpenSSL 支持的填充方案进行加密。

实际的加密工作由 encrypt_rsa 函数完成，它接受一个 RSA 密钥对作为参数，并调用该对象的方法对输入数据进行加密。

加密过程结束后，我们得到了加密数据和加密密钥的大字节数组，可以将其上传到云端。

2. 数据解密流程

解密是加密过程的逆过程，主要包括以下两个步骤：
1. 分离并解密 Ksym ：在加密存档中分离出 Ksym，并使用 Kenc 对其进行解密。
2. 使用 AES - 256 在 CBC 模式下解密数据 ：使用解密后的 Ksym 作为密钥对数据进行解密。

以下是解密数据的代码示例：

def aes256_decrypt_data(ciphertext, key, iv):
    """ Decryption and unpadding using AES256-CBC and
    the specified key and IV."""
    dec =0
    cipher = EVP.Cipher('aes_256_cbc', key, iv, dec, 0)
    pbuf = cStringIO.StringIO()
    cbuf = cStringIO.StringIO(ciphertext)
    plaintext = aes256_stream_helper(cipher, cbuf, pbuf)
    pbuf.close()
    cbuf.close()
    return plaintext

def block_decrypt(ciphertext, key):
    """ High level decryption function. Assumes IV is of block size and
    precedes the actual ciphertext"""
    iv = ciphertext[:32]
    ciphertext = ciphertext[32:]

def decrypt_rsa(rsa_key, ciphertext):
    return rsa_key.private_decrypt(ciphertext, RSA.pkcs1_padding)

3. 数据签名和验证

为了检测数据是否被篡改，我们需要对存档进行签名并验证签名。使用签名 RSA 密钥（Ksign）可以实现这一目标，它与加密密钥对（Kenc）不同。

签名过程非常简单：首先使用加密哈希算法对要签名的数据进行哈希处理，然后使用私钥对哈希值进行加密以创建签名。接收者使用相同的算法计算哈希值，使用发送者的公钥解密签名，并检查哈希值是否匹配。

以下是签名和验证数据的代码示例：

def sign_rsa(rsa_key, data):
    """ Expects an RSA key pair. Signs with SHA-256.
    Would like to use RSASSA-PSS but only dev
    versions of M2Crypto support that"""
    digest = hashlib.sha256(data).digest()
    return rsa_key.sign(digest, 'sha256')

def verify_rsa(rsa_key, data, sig):
    """ Verifies a signature"""
    digest = hashlib.sha256(data).digest()
    return rsa_key.verify(digest, sig, 'sha256')==1

4. 加密存档的创建和解密

以下是创建加密存档和解密加密存档的代码示例：

def create_encrypted_archive(directory_or_file, archive_name, keys):
    # First, let's tar and gzip the file/folder we're given to
    # the temp directory. This is a roundabout way of getting the tar+gzipped
    # data into memory due to a bug in tarfile with dealing with StringIO
    tempdir = tempfile.gettempdir() + "/"
    temp_file = tempdir  + archive_name + ".tar.gz"
    generate_tar_gzip(directory_or_file, temp_file)
    gzip_file_handle = open(temp_file,"rb")
    gzip_data = gzip_file_handle.read()
    gzip_file_handle.close()
    os.remove(temp_file) #We don't need source tar gzip file
    #Generate a session AES-256 key and encrypt gzipped archive with it
    aes_key = crypto.generate_rand_bits(256)
    encrypted_archive = crypto.block_encrypt( gzip_data, aes_key)
    # Encrypt Ksym (session key) with RSA key (Kenc)
    rsa_enc_key = keys[crypto.ENCRYPTION_KEY]
    aes_key_enc = crypto.encrypt_rsa(rsa_enc_key, aes_key) #256 bytes
    # Sign encrypted data
    # There's much debate regarding in which order you encrypt and sign/mac.
    # I prefer this way since this lets us not have to decrypt anything
    # when the signature is invalid
    # See http://www.daemonology.net/blog/2009-06-24-encrypt-then-mac.html
    rsa_sign_key = keys[crypto.SIGNING_KEY]
    rsa_sig = crypto.sign_rsa(rsa_sign_key, encrypted_archive) #256 bytes
    # Append encrypted aes key, signature and archive in that order
    return aes_key_enc + rsa_sig + encrypted_archive

def extract_encrypted_archive(archive_name, keys):
    #Load archive. Separate into encrypted AES key, plaintext sig of
    # encrypted data and encrypted archive itself
    enc_file = open(archive_name, "rb")
    enc_file_bytes = enc_file.read()
    enc_file.close()
    enc_aes_key = enc_file_bytes[0:256]
    rsa_sig = enc_file_bytes[256:512]
    enc_data = enc_file_bytes[512:]
    rsa_sign_key = keys[crypto.SIGNING_KEY]
    rsa_enc_key = keys[crypto.ENCRYPTION_KEY]
    # Check the signature in the file to see whether it matches the
    # encrypted data. We do Encrypt-Then-Mac here so that
    # we avoid decryption
    if not crypto.verify_rsa(rsa_sign_key, enc_data, rsa_sig):
        print "Signature verification failure. Corrupt or tampered archive!"
        return
    # Decrypt the AES key and then decrypt the
    # encrypted archive using the decrypted AES key
    aes_key = crypto.decrypt_rsa(rsa_enc_key, enc_aes_key)
    decrypted_archive_bytes = crypto.block_decrypt(enc_data, aes_key)
    # Write a temporary file and then extract the contents to the
    # current directory
    [os_handle,temp_file] = tempfile.mkstemp()
    temp_file_handle = os.fdopen(os_handle, 'wb')
    temp_file_handle.write(decrypted_archive_bytes)
    temp_file_handle.close()
    extract_tar_gzip(temp_file, ".")
    os.remove(temp_file)

5. Encrypt - Then - MAC 与 MAC - Then - Encrypt

在加密存档数据时，可以选择在加密数据上计算签名（Encrypt - Then - MAC），也可以选择在明文上计算签名并加密签名和明文（MAC - Then - Encrypt）。两种技术都是有效的，具体选择取决于个人偏好。

在示例代码中，采用的是 Encrypt - Then - MAC 方式，这种方式可以在不进行解密的情况下立即检测存档是否有效。

6. 高效上传

直接将加密数据备份到云端的方式存在两个缺点：一是单个请求的上传限制为 64 MB，大型目录的备份通常会超过这个大小；二是长请求不仅不能充分利用可用带宽，而且在请求失败时需要从头开始。

为了解决这个问题，可以使用块上传的方式。将加密存档分割成小块，然后并行上传这些块，从而加快上传过程。

以下是块支持和块上传的代码示例：

# 块支持代码
def put_block(self, container_name, blob_name, block_id, data):
    # Take a block id and construct a URL-safe, base64 version
    base64_blockid = base64.encodestring(str(block_id)).strip()
    urlencoded_base64_blockid = urllib.quote(base64_blockid)
    # Make a PUT request with the block data to blob URI followed by
    # ?comp=block&blockid=<blockid>
    return self._do_store_request("/" + container_name + "/" + \
                                  blob_name + \
                                  "?comp=block&blockid=" + \
                                  urlencoded_base64_blockid, \
                                  'PUT', {}, data)

def put_block_list(self, container_name, blob_name, \
                   block_list, content_type):
    headers = {}
    if content_type is not None:
        headers["Content-Type"] = content_type
    # Begin XML content
    xml_request = "<?xml version=\"1.0\" encoding=\"utf-8\"?><BlockList>"
    # Concatenate block ids into block list
    for block_id in block_list:
        xml_request += "<Block>" + \
                       base64.encodestring(str(block_id)).strip() + "</Block>"
    xml_request += "</BlockList>"
    # Make a PUT request to blob URI followed by ?comp=blocklist
    return self._do_store_request("/" + container_name + \
                                  "/" + blob_name + \
                                  "?comp=blocklist", 'PUT', \
                                  headers, xml_request)

# 块上传代码
def upload_archive(data, filename, account, key):
    conn = storage.Storage("blob.core.windows.net",account, key)
    # Try and create container. Will harmlessly fail if already exists
    conn.create_container("enc", False)
    # Heuristics for blocks
    # We're pretty hardcoded at the moment. We don't bother using blocks
    # for files less than 4MB.
    if len(data) < 0:# 4 * 1024 * 1024:
        resp = conn.put_blob("enc", filename, data,"application/octet-stream")
    else:
        resp = upload_archive_using_blocks(data, filename, conn)
    if not (resp.status >=200 and resp.status < 400):
        # Error! No error handling at the moment
        print resp.status, resp.reason, resp.read()
        sys.exit(1)

def upload_archive_using_blocks(data, filename, conn):
    blocklist=[]
    queue = Queue.Queue()
    if parallel_upload:
        # parallel_upload specifies whether blocks should be uploaded
        # in parallel and is set from the command line.
        for i in range(num_threads):
            t = task.ThreadTask(queue)
            t.setDaemon(True) # Run even without workitems
            t.start()
    offset =0
    # Block uploader function used in thread queue
    def block_uploader(connection, block_id_to_upload,\
                     block_data_to_upload):
        resp = connection.put_block("enc", filename, block_id_to_upload,\
                                          block_data_to_upload)
        if not( resp.status>=200 and resp.status <400):
            print resp.status, resp.reason, resp.read()
            sys.exit(1) # Need retry logic on error
    while True:
        if offset>= len(data):
            break
        # Get size of next block. Process in 4MB chunks
        data_to_process = min( 4*1024*1024, len(data)-offset)
        # Slice off next block. Generate an SHA-256 block id
        # In the future, we could use it to see whether a block
        # already exists to avoid re-uploading it
        block_data = data[offset: offset+data_to_process]
        block_id =  hashlib.sha256(block_data).hexdigest()
        blocklist.append(block_id)
        if parallel_upload:
           # Add work item to the queue.
            queue.put([block_uploader, [conn, block_id, block_data]])
        else:
            block_uploader(conn, block_id, block_data)
        # Move i forward
        offset+= data_to_process
    # Wait for all block uploads to finish

总结

通过以上步骤，我们实现了数据的加密、解密、签名验证以及高效上传。加密过程确保了数据的安全性，签名验证保证了数据的完整性和真实性，而高效上传则提高了备份的效率。在实际应用中，可以根据具体需求选择合适的加密和上传方式，以确保数据的安全和备份的高效。

以下是数据加密备份和上传的流程图：

graph LR
    classDef process fill:#E5F6FF,stroke:#73A6FF,stroke-width:2px;

    A(开始):::process --> B(生成压缩存档):::process
    B --> C(生成 Ksym 密钥):::process
    C --> D(使用 AES - 256 在 CBC 模式下加密存档):::process
    D --> E(使用 Kenc 加密 Ksym):::process
    E --> F(签名加密数据):::process
    F --> G(分割加密存档为块):::process
    G --> H(并行上传块):::process
    H --> I(结束):::process

以下是解密过程的流程图：

graph LR
    classDef process fill:#E5F6FF,stroke:#73A6FF,stroke-width:2px;

    A(开始):::process --> B(下载加密存档):::process
    B --> C(分离加密的 Ksym 和签名):::process
    C --> D(验证签名):::process
    D -->|签名有效| E(使用 Kenc 解密 Ksym):::process
    E --> F(使用 Ksym 解密存档):::process
    F --> G(提取存档内容):::process
    G --> H(结束):::process
    D -->|签名无效| I(提示存档损坏或被篡改):::process
    I --> H

通过这些技术和代码，我们可以构建一个安全、高效的数据备份系统，确保数据在传输和存储过程中的安全性和完整性。

数据加密备份与高效上传技术详解

7. 上传流程详细解析

为了更深入地理解高效上传的过程，下面详细解析上传流程。上传流程主要分为以下几个关键步骤：

7.1 建立连接

首先，需要建立与存储服务的连接。在 upload_archive 函数中，通过 storage.Storage 类创建一个连接对象 conn ，指定存储服务的地址、账户和密钥。

conn = storage.Storage("blob.core.windows.net", account, key)

7.2 创建容器

尝试创建一个容器，如果容器已经存在，创建操作会无害地失败。

conn.create_container("enc", False)

7.3 判断是否使用块上传

根据数据的大小，判断是否使用块上传。如果数据大小小于 4MB（代码中当前判断条件有误，应为 len(data) < 4 * 1024 * 1024 ），则直接上传整个数据；否则，使用块上传。

if len(data) < 4 * 1024 * 1024:
    resp = conn.put_blob("enc", filename, data, "application/octet-stream")
else:
    resp = upload_archive_using_blocks(data, filename, conn)

7.4 块上传流程

如果使用块上传，具体步骤如下：
1. 初始化块列表和队列 ：创建一个块列表 blocklist 用于存储块 ID，以及一个队列 queue 用于并行上传任务。

blocklist = []
queue = Queue.Queue()

启动线程（如果并行上传） ：根据 parallel_upload 参数决定是否并行上传块。如果是，则启动多个线程。

if parallel_upload:
    for i in range(num_threads):
        t = task.ThreadTask(queue)
        t.setDaemon(True)
        t.start()

分割数据并上传块 ：循环遍历数据，每次处理 4MB 的数据块。为每个块生成一个 SHA - 256 块 ID，并将其添加到块列表中。如果是并行上传，则将上传任务添加到队列中；否则，直接调用上传函数。

offset = 0
while True:
    if offset >= len(data):
        break
    data_to_process = min(4 * 1024 * 1024, len(data) - offset)
    block_data = data[offset: offset + data_to_process]
    block_id = hashlib.sha256(block_data).hexdigest()
    blocklist.append(block_id)
    if parallel_upload:
        queue.put([block_uploader, [conn, block_id, block_data]])
    else:
        block_uploader(conn, block_id, block_data)
    offset += data_to_process

等待所有块上传完成 ：确保所有块都上传完成后，流程结束。

8. 代码中关键函数总结

为了更清晰地理解整个数据加密备份和上传的过程，下面对代码中的关键函数进行总结，如下表所示：
| 函数名 | 功能 |
| ---- | ---- |
| generate_rand_bits | 生成指定位数的随机密钥 |
| block_encrypt | 加密数据并返回 IV 和密文的连接版本 |
| aes256_encrypt_data | 使用 AES - 256 在 CBC 模式下加密数据 |
| encrypt_rsa | 使用 RSA 公钥加密数据 |
| aes256_decrypt_data | 使用 AES - 256 在 CBC 模式下解密数据 |
| block_decrypt | 高级解密函数，分离 IV 和密文 |
| decrypt_rsa | 使用 RSA 私钥解密数据 |
| sign_rsa | 使用 RSA 私钥对数据进行签名 |
| verify_rsa | 验证 RSA 签名 |
| create_encrypted_archive | 创建加密存档 |
| extract_encrypted_archive | 解密并提取加密存档的内容 |
| put_block | 上传一个数据块 |
| put_block_list | 提交块列表，完成块的拼接 |
| upload_archive | 上传加密存档，根据数据大小选择上传方式 |
| upload_archive_using_blocks | 使用块上传方式上传加密存档 |

9. 注意事项和最佳实践

在使用上述技术和代码时，需要注意以下几点：
- 随机数生成 ：使用 random.SystemRandom 生成随机密钥，避免使用 Python 内置的随机数生成器，以确保密钥的安全性。
- IV 的使用 ：初始化向量（IV）必须是唯一的，且不能重复使用，以防止安全漏洞。
- 签名验证 ：在解密存档之前，务必验证签名的有效性，以确保数据的完整性和真实性。
- 块上传 ：对于大型文件，使用块上传可以提高上传效率，但需要注意处理上传失败的情况，添加重试逻辑。

10. 未来展望

随着数据量的不断增加和安全需求的提高，数据加密备份和高效上传技术也将不断发展。未来可以考虑以下方面的改进：
- 更强大的加密算法 ：随着计算能力的提升，可能需要采用更高级的加密算法来确保数据的安全性。
- 智能上传策略 ：根据网络状况和数据特点，动态调整上传块的大小和并行度，进一步提高上传效率。
- 自动化管理 ：实现备份任务的自动化管理，包括定期备份、错误处理和日志记录等功能。