20、网络取证与多进程处理技术解析

cuda7parallel

于 2025-10-13 16:36:00 发布

阅读量22

点赞数

CC 4.0 BY-SA版权

分类专栏： Python取证：数字侦探的艺术文章标签：网络取证多进程处理 Python

本文链接：https://blog.youkuaiyun.com/cuda7parallel/article/details/153304365

Python取证：数字侦探的艺术专栏收录该内容

22 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

网络取证与多进程处理技术解析

1. 网络数据包捕获与分析应用

1.1 PSNMT应用概述

PSNMT（可能是某个网络监测工具）是一个用Python编写的命令行应用程序，主要用于捕获和记录网络中的TCP或UDP数据包。该应用程序可以通过命令行参数进行控制，方便与cron作业或其他调度机制集成。其命令行参数及用途如下：
| 参数 | 用途 |
| ---- | ---- |
| -v | 详细模式，指定时将中间结果输出到标准输出 |
| -m | 指定数据包收集活动的持续时间（分钟） |
| –TCP 或 –UDP | 定义要捕获的应用层协议 |
| -p | 指定取证日志和CSV文件的输出目录 |

例如，使用以下命令可以捕获TCP数据包60分钟，输出详细信息，并在用户桌面创建日志和CSV文件：

sudo Python psnmt –v –TCP –m 60 –p /home/chet/Desktop

1.2 代码模块分析

1.2.1 数据包提取与解码模块（decoder.py）

此模块的核心功能是从IP、TCP和UDP报头中提取字段。 PacketExtractor 函数接收数据包和显示开关作为输入，根据协议类型（TCP或UDP）提取相应的字段，并根据显示开关决定是否打印详细信息。以下是部分代码示例：

import socket, sys
from struct import *

PROTOCOL_TCP = 6
PROTOCOL_UDP = 17

def PacketExtractor(packet, displaySwitch):
    # 提取IP报头
    stripPacket = packet[0:20]
    ipHeaderTuple = unpack('!BBHHHBBH4s4s', stripPacket)
    # 提取各个字段
    verLen = ipHeaderTuple[0]
    dscpECN = ipHeaderTuple[1]
    packetLength = ipHeaderTuple[2]
    # ... 其他字段提取 ...
    # 计算和转换提取的值
    version = verLen >> 4
    length = verLen & 0x0F
    ipHdrLength = length * 4
    sourceAddress = socket.inet_ntoa(ipHeaderTuple[8])
    destinationAddress = socket.inet_ntoa(ipHeaderTuple[9])

    if displaySwitch:
        print('========================')
        print('IP HEADER')
        print('_______________________')
        print('Version:', str(version))
        print('Packet Length:', str(packetLength), 'bytes')
        # ... 其他信息打印 ...

    if protocol == PROTOCOL_TCP:
        # 提取TCP报头
        stripTCPHeader = packet[ipHdrLength:ipHdrLength + 20]
        tcpHeaderBuffer = unpack('!HHLLBBHHH', stripTCPHeader)
        sourcePort = tcpHeaderBuffer[0]
        destinationPort = tcpHeaderBuffer[1]
        # ... 其他TCP字段提取 ...
        if displaySwitch:
            print('TCP Header')
            print('_______________________')
            print('Source Port:', str(sourcePort))
            print('Destination Port :', str(destinationPort))
            # ... 其他TCP信息打印 ...
        return ['TCP', sourceAddress, sourcePort, destinationAddress, destinationPort]
    elif protocol == PROTOCOL_UDP:
        # 提取UDP报头
        stripUDPHeader = packet[ipHdrLength:ipHdrLength + 8]
        udpHeaderBuffer = unpack('!HHHH', stripUDPHeader)
        sourcePort = udpHeaderBuffer[0]
        destinationPort = udpHeaderBuffer[1]
        # ... 其他UDP字段提取 ...
        if displaySwitch:
            print('UDP Header')
            print('_______________________')
            print('Source Port:', str(sourcePort))
            print('Destination Port :', str(destinationPort))
            # ... 其他UDP信息打印 ...
        return ['UDP', sourceAddress, sourcePort, destinationAddress, destinationPort]
    else:
        if displaySwitch:
            print('Found Protocol :', str(protocol))
        return ['Unsupported', sourceAddress, 0, destinationAddress, 0]

1.2.2 命令行参数解析模块（commandParser.py）

该模块使用Python标准库 argparse 来处理和验证命令行参数。 ParseCommandLine 函数创建一个参数解析器，设置各个参数，并处理互斥组和必需参数。 ValidateDirectory 函数用于验证输出目录的有效性和可写性。

import argparse
import os

def ParseCommandLine():
    parser = argparse.ArgumentParser('PS-NMT')
    parser.add_argument('-v', '--verbose', help="Display packet details", action='store_true')
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--TCP', help='TCP Packet Capture', action='store_true')
    group.add_argument('--UDP', help='UDP Packet Capture', action='store_true')
    parser.add_argument('-m', '--minutes', help='Capture Duration in minutes', type=int)
    parser.add_argument('-p', '--outPath', type=ValidateDirectory, required=True, help="Output Directory")
    theArgs = parser.parse_args()
    return theArgs

def ValidateDirectory(theDir):
    if not os.path.isdir(theDir):
        raise argparse.ArgumentTypeError('Directory does not exist')
    if os.access(theDir, os.W_OK):
        return theDir
    else:
        raise argparse.ArgumentTypeError('Directory is not writable')

1.2.3 日志处理模块（classLogging.py）

_ForensicLog 类负责处理取证日志操作。在初始化时，它会开启日志记录，并设置日志文件的格式和级别。 writeLog 方法根据日志类型（INFO、ERROR、WARNING等）将记录写入日志文件。在销毁对象时，会记录日志关闭信息并关闭日志记录。

import logging

class _ForensicLog:
    def __init__(self, logName):
        try:
            logging.basicConfig(filename=logName, level=logging.DEBUG, format='%(asctime)s %(message)s')
        except:
            print("Forensic Log Initialization Failure . . . Aborting")
            exit(0)

    def writeLog(self, logType, logMessage):
        if logType == "INFO":
            logging.info(logMessage)
        elif logType == "ERROR":
            logging.error(logMessage)
        elif logType == "WARNING":
            logging.warning(logMessage)
        else:
            logging.error(logMessage)

    def __del__(self):
        logging.info("Logging Shutdown")
        logging.shutdown()

1.2.4 CSV文件处理模块（csvHandler.py）

_CSVWriter 类处理与逗号分隔值（CSV）文件相关的操作。在初始化时，它会创建一个CSV文件并写入标题行。 writeCSVRow 方法将一行数据写入CSV文件。在销毁对象时，会关闭CSV文件。

import csv

class _CSVWriter:
    def __init__(self, fileName):
        try:
            self.csvFile = open(fileName, 'wb')
            self.writer = csv.writer(self.csvFile, delimiter=',', quoting=csv.QUOTE_ALL)
            self.writer.writerow(('Protocol', 'Source IP', 'Source Port', 'Destination IP', 'Destination Port'))
        except:
            log.error('CSV File Failure')

    def writeCSVRow(self, row):
        self.writer.writerow((row[0], row[1], str(row[2]), row[3], str(row[4])))

    def __del__(self):
        self.csvFile.close()

1.3 程序执行流程

程序的主要执行流程如下：

graph TD;
    A[开始] --> B[解析命令行参数];
    B --> C{是否设置混杂模式};
    C -- 是 --> D[捕获数据包];
    C -- 否 --> E[输出提示信息];
    D --> F[提取数据包信息];
    F --> G{协议类型};
    G -- TCP --> H[处理TCP数据包];
    G -- UDP --> I[处理UDP数据包];
    G -- 其他 --> J[处理不支持的协议];
    H --> K[记录信息到CSV文件];
    I --> K;
    J --> K;
    K --> L[记录日志信息];
    L --> M[关闭日志和CSV文件];
    M --> N[结束];
    E --> N;

2. 多进程处理技术在取证中的应用

2.1 多进程处理概述

多进程处理是指在两个或多个中央处理器（CPU）或核心上同时执行程序。为了实现显著的性能提升，开发人员需要定义代码中具有以下特征的区域：
1. 代码是处理器密集型的。
2. 代码可以拆分为多个独立的处理线程，这些线程可以并行执行。
3. 线程之间的处理可以实现负载均衡，即每个线程大致同时完成处理。

2.2 Python多进程支持

Python标准库中的 multiprocessing 模块提供了多进程处理的支持。通过导入该模块并使用其中的函数和类，可以方便地实现多进程编程。以下是一些常用的函数和类：
| 函数/类 | 描述 |
| ---- | ---- |
| Process | 表示在单独进程中运行的活动，类似于 threading.Thread |
| Array | 返回一个同步共享数组 |
| Lock | 返回一个非递归锁对象 |
| Pool | 返回一个进程池对象 |
| Queue | 返回一个队列对象 |

2.3 简单多进程示例：文件搜索

2.3.1 单核心文件搜索解决方案

以下是一个简单的单核心文件搜索程序，它依次调用 SearchFile 函数四次，每次搜索一个字符串。

import time

def SearchFile(theFile, theString):
    try:
        fp = open(theFile, 'r')
        buffer = fp.read()
        fp.close()
        if theString in buffer:
            print('File:', theFile, 'String:', theString, '\t', 'Found')
        else:
            print('File:', theFile, 'String:', theString, '\t', 'Not Found')
    except:
        print('File processing error')

startTime = time.time()
SearchFile('c:\\TESTDIR\\Dictionary.txt', 'thought')
SearchFile('c:\\TESTDIR\\Dictionary.txt', 'exile')
SearchFile('c:\\TESTDIR\\Dictionary.txt', 'xavier')
SearchFile('c:\\TESTDIR\\Dictionary.txt', '$Slllb!')
elapsedTime = time.time() - startTime
print('Duration:', elapsedTime)

该程序的执行时间为4.3140001297秒。

2.3.2 多进程文件搜索解决方案

以下是使用多进程实现的文件搜索程序，它创建四个进程并将处理均匀分配到四个核心上。

from multiprocessing import Process
import time

def SearchFile(theFile, theString):
    try:
        fp = open(theFile, 'r')
        buffer = fp.read()
        fp.close()
        if theString in buffer:
            print('File:', theFile, 'String:', theString, '\t', 'Found')
        else:
            print('File:', theFile, 'String:', theString, '\t', 'Not Found')
    except:
        print('File processing error')

if __name__ == '__main__':
    startTime = time.time()
    p1 = Process(target=SearchFile, args=('c:\\TESTDIR\\Dictionary.txt', 'thought'))
    p1.start()
    p2 = Process(target=SearchFile, args=('c:\\TESTDIR\\Dictionary.txt', 'exile'))
    p2.start()
    p3 = Process(target=SearchFile, args=('c:\\TESTDIR\\Dictionary.txt', 'xavier'))
    p3.start()
    p4 = Process(target=SearchFile, args=('c:\\TESTDIR\\Dictionary.txt', '$Slllb'))
    p4.start()
    # 等待所有进程完成
    p1.join()
    p2.join()
    p3.join()
    p4.join()
    elapsedTime = time.time() - startTime
    print('Duration:', elapsedTime)

该程序的性能明显优于单核心解决方案，即使考虑到每次打开、读取和关闭文件的I/O延迟。

通过以上示例可以看出，多进程处理技术可以显著提高程序的性能，特别是在处理处理器密集型任务时。在网络取证领域，合理应用多进程处理技术可以更高效地处理大量的网络数据包和文件，提高取证工作的效率。

2.4 多进程文件哈希处理

2.4.1 单核心解决方案

单核心文件哈希处理方案是依次对文件进行哈希计算，代码如下：

import hashlib
import time

def hash_file(file_path):
    try:
        hash_object = hashlib.sha256()
        with open(file_path, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_object.update(chunk)
        return hash_object.hexdigest()
    except Exception as e:
        print(f"Error hashing file: {e}")
        return None

file_paths = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt']
start_time = time.time()
for file_path in file_paths:
    hash_result = hash_file(file_path)
    if hash_result:
        print(f"File: {file_path}, Hash: {hash_result}")
elapsed_time = time.time() - start_time
print(f"Single core duration: {elapsed_time}")

这个方案按顺序对每个文件进行哈希计算，在处理大量文件时会比较耗时。

2.4.2 多核心解决方案A

以下是使用 multiprocessing.Pool 实现的多核心文件哈希处理方案：

import hashlib
import time
from multiprocessing import Pool

def hash_file(file_path):
    try:
        hash_object = hashlib.sha256()
        with open(file_path, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_object.update(chunk)
        return file_path, hash_object.hexdigest()
    except Exception as e:
        print(f"Error hashing file: {e}")
        return file_path, None

if __name__ == '__main__':
    file_paths = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt']
    start_time = time.time()
    with Pool() as pool:
        results = pool.map(hash_file, file_paths)
    for file_path, hash_result in results:
        if hash_result:
            print(f"File: {file_path}, Hash: {hash_result}")
    elapsed_time = time.time() - start_time
    print(f"Multi - core solution A duration: {elapsed_time}")

这个方案使用 Pool 类将文件哈希任务分配到多个核心上并行处理，大大提高了处理速度。

2.4.3 多核心解决方案B

另一种多核心文件哈希处理方案是手动创建多个 Process 对象：

import hashlib
import time
from multiprocessing import Process, Queue

def hash_file(file_path, result_queue):
    try:
        hash_object = hashlib.sha256()
        with open(file_path, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hash_object.update(chunk)
        result_queue.put((file_path, hash_object.hexdigest()))
    except Exception as e:
        print(f"Error hashing file: {e}")
        result_queue.put((file_path, None))

if __name__ == '__main__':
    file_paths = ['file1.txt', 'file2.txt', 'file3.txt', 'file4.txt']
    result_queue = Queue()
    processes = []
    start_time = time.time()
    for file_path in file_paths:
        p = Process(target=hash_file, args=(file_path, result_queue))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()
    while not result_queue.empty():
        file_path, hash_result = result_queue.get()
        if hash_result:
            print(f"File: {file_path}, Hash: {hash_result}")
    elapsed_time = time.time() - start_time
    print(f"Multi - core solution B duration: {elapsed_time}")

此方案手动管理进程的创建和执行，同样能实现并行哈希计算。

2.5 多进程哈希表生成

2.5.1 单核心密码生成器代码

单核心密码生成器按顺序生成密码并进行哈希计算，代码如下：

import hashlib
import time

charset = 'abcdefghijklmnopqrstuvwxyz'
max_length = 3

def generate_passwords():
    from itertools import product
    for length in range(1, max_length + 1):
        for combination in product(charset, repeat=length):
            password = ''.join(combination)
            hash_object = hashlib.sha256(password.encode())
            hash_result = hash_object.hexdigest()
            print(f"Password: {password}, Hash: {hash_result}")

start_time = time.time()
generate_passwords()
elapsed_time = time.time() - start_time
print(f"Single core password generation duration: {elapsed_time}")

2.5.2 多核心密码生成器

多核心密码生成器使用 multiprocessing 模块将密码生成任务分配到多个核心上：

import hashlib
import time
from multiprocessing import Pool

charset = 'abcdefghijklmnopqrstuvwxyz'
max_length = 3

def generate_passwords_part(start, end):
    from itertools import product
    results = []
    for length in range(1, max_length + 1):
        for i, combination in enumerate(product(charset, repeat=length)):
            if start <= i < end:
                password = ''.join(combination)
                hash_object = hashlib.sha256(password.encode())
                hash_result = hash_object.hexdigest()
                results.append((password, hash_result))
    return results

if __name__ == '__main__':
    num_processes = 4
    total_passwords = sum(len(charset) ** i for i in range(1, max_length + 1))
    chunk_size = total_passwords // num_processes
    ranges = [(i * chunk_size, (i + 1) * chunk_size) for i in range(num_processes)]
    ranges[-1] = (ranges[-1][0], total_passwords)

    start_time = time.time()
    with Pool(processes=num_processes) as pool:
        results = pool.starmap(generate_passwords_part, ranges)
    for sub_results in results:
        for password, hash_result in sub_results:
            print(f"Password: {password}, Hash: {hash_result}")
    elapsed_time = time.time() - start_time
    print(f"Multi - core password generation duration: {elapsed_time}")

这个多核心密码生成器将密码生成任务分割成多个部分，每个部分由一个进程处理，从而提高了生成效率。

3. 总结与展望

3.1 技术总结

网络数据包捕获与分析 ：通过PSNMT应用程序可以方便地捕获和分析网络中的TCP和UDP数据包，其模块化的设计使得代码易于维护和扩展。各个模块分工明确，如数据包提取与解码、命令行参数解析、日志处理和CSV文件处理等。
多进程处理技术 ：Python的 multiprocessing 模块为多进程编程提供了强大的支持。在文件搜索、文件哈希处理和密码生成等场景中，多进程处理技术能够显著提高程序的性能，特别是在处理大量数据和处理器密集型任务时。

3.2 未来展望

在网络取证领域，随着网络数据量的不断增加，对处理效率的要求也越来越高。未来可以进一步探索多进程处理技术在以下方面的应用：
1. 实时数据包处理 ：利用多进程技术实现对网络数据包的实时捕获、分析和存储，及时发现网络中的异常行为。
2. 分布式取证系统 ：结合云计算和多进程处理技术，构建分布式取证系统，实现大规模网络数据的高效处理。
3. 智能取证分析 ：将多进程处理与机器学习、人工智能等技术相结合，实现对网络数据的智能分析和挖掘，提高取证的准确性和效率。

3.3 操作建议

如果要在实际项目中应用这些技术，可以按照以下步骤进行：
1. 需求分析 ：明确项目的需求，确定是否需要处理大量数据或进行处理器密集型任务。
2. 方案设计 ：根据需求选择合适的技术方案，如单核心处理或多进程处理。
3. 代码实现 ：根据设计方案编写代码，注意代码的可维护性和扩展性。
4. 测试优化 ：对代码进行测试，评估性能，并根据测试结果进行优化。

以下是一个简单的流程图，展示了在网络取证中应用多进程处理技术的一般流程：

graph TD;
    A[确定取证任务] --> B{是否适合多进程处理};
    B -- 是 --> C[设计多进程方案];
    B -- 否 --> D[采用单核心方案];
    C --> E[编写多进程代码];
    D --> F[编写单核心代码];
    E --> G[测试与优化];
    F --> G;
    G --> H[部署应用];

通过合理应用网络数据包捕获分析技术和多进程处理技术，可以在网络取证工作中取得更好的效果，提高工作效率和准确性。