19、基于Python的网络数据包捕获与分析

最新推荐文章于 2025-11-25 14:03:37 发布

cuda7parallel

最新推荐文章于 2025-11-25 14:03:37 发布

阅读量30

点赞数

CC 4.0 BY-SA版权

分类专栏： Python取证：数字侦探的艺术文章标签： Python 网络数据包捕获数据分析

本文链接：https://blog.youkuaiyun.com/cuda7parallel/article/details/153304360

Python取证：数字侦探的艺术专栏收录该内容

22 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

基于Python的网络数据包捕获与分析

1. 网络端口镜像与嗅探基础

SPAN端口通常与思科相关，最初被称为端口镜像。现代交换机可以配置为将特定网络端口镜像到一个用于网络监控的公共接口，并与各种安全设备进行交互。

在Python中进行数据包嗅探，需要满足以下条件：
1. 使用能够在混杂模式下运行的网络接口卡（NIC）。
2. 在大多数现代操作系统（如Windows、Linux和Mac OS X）上，必须拥有管理员权限。
3. 完成上述两个条件后，就可以创建原始套接字。

1.1 混杂模式介绍

当一个支持的NIC处于混杂模式时，它可以拦截并读取每个到达的网络数据包的全部内容。如果NIC不在混杂模式下，它只会接收专门发送给该NIC的数据包。混杂模式必须由NIC、操作系统和相关驱动程序支持。并非所有NIC都支持混杂模式，但确定自己的NIC和操作系统是否支持混杂模式并不困难。

1.2 在Ubuntu 12.04 LTS中设置混杂模式示例

在Linux上，可以使用 ifconfig 命令将NIC置于混杂模式（需要管理员权限）。
启用混杂模式的命令：

chet@PythonForensics:$ sudo ifconfig eth0 promisc

验证结果：

chet@PythonForensics:$ sudo ifconfig
eth0 Link encap:Ethernet HWaddr 00:1e:8c:b7:6d:64
inet addr:192.168.0.25 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::21e:8cff:feb7:6d64/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:43284 errors:0 dropped:0 overruns:0 frame:0
TX packets:11338 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17659022 (17.6 MB) TX bytes:1824060 (1.8 MB)

关闭混杂模式并验证结果：

chet@PythonForensics:$ sudo ifconfig eth0 –promisc
chet@PythonForensics:$ sudo ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:1e:8c:b7:6d:64
inet addr:192.168.0.25 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::21e:8cff:feb7:6d64/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:43381 errors:0 dropped:0 overruns:0 frame:0
TX packets:11350 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17668285 (17.6 MB) TX bytes:1827000 (1.8 MB)

确定NIC能够进入混杂模式后，就可以在Python中使用原始套接字了。

2. Linux下Python的原始套接字

2.1 创建原始套接字

在Python中创建原始套接字非常简单，以下脚本展示了具体操作：

# Note: Script must be run with admin privledge
# import the socket and os libraries
import socket
import os
# issue the command to place the adapter in promiscious mode
ret = os.system("ifconfig eth0 promisc")
# if the command was successful continue
if ret == 0:
    # Create a Raw Socket in Linux
    # AF_INET specifies ipv4 packets
    # SOCK_RAW specifies a raw protocol at the network layer
    # IPPROTO_TCP specifies the protocol to capture
    mySocket = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_TCP)
    # Receive the next packet up to 255 bytes
    # Note this is a synchronous call and will wait until
    # a packet is received
    recvBuffer, addr = mySocket.recvfrom(255)
    # Print out the contents of the buffer
    print recvBuffer
    ret = os.system("ifconfig eth0 -promisc")
else:
    # if the system command fails print out a message
    print 'Promiscious Mode not Set'

这个脚本会完成以下操作：
1. 在NIC上启用混杂模式。
2. 创建一个原始套接字。
3. 捕获通过NIC的下一个TCP数据包。
4. 打印数据包的内容。
5. 禁用NIC上的混杂模式。
6. 关闭原始套接字。

2.2 解包缓冲区

从这样的缓冲区中提取信息可能会很繁琐，因为必须从缓冲区中解析出相关信息。为了处理具有定义结构的缓冲数据，Python提供了 unpack() 函数。

例如，从IPv4头部提取信息：

ipHeader = packet[0:20]
buffer = unpack('!BBHHHBBH4s4s', ipHeader)

unpack() 函数接受两个参数，第一个是定义缓冲区中数据格式的字符串，第二个是需要解析的缓冲区。该函数返回一个元组，可以像处理列表一样处理它。

2.3 解析IPv4头部格式字符串

格式字符串“!BBHHHBBH4s4s”中每个字符都有特定的含义，它决定了 unpack() 函数的处理方式。以下是各字符的含义及与IPv4头部的映射：
| 格式 | Python类型 | 字节数 | IPv4映射 | 定义 |
| ---- | ---- | ---- | ---- | ---- |
|! | 大端字节序 | - | - | 网络数据包通常采用大端字节序 |
| B | 整数 | 1 | 版本和IHL | 4位版本字段（IPv4为4），4位互联网头部长度，表示头部中包含的32位字的数量 |
| B | 整数 | 1 | DSCP和ECN | 7位差异化服务代码点，1位拥塞通知 |
| H | 整数 | 2 | 总长度 | 16位定义整个数据包的大小 |
| H | 整数 | 2 | 标识 | 16位用于一组IP分片的标识符 |
| H | 整数 | 2 | 标志和分片偏移 | 3位分片标志，13位分片偏移值 |
| B | 整数 | 1 | 生存时间（TTL） | 8位TTL值，防止数据包循环 |
| B | 整数 | 1 | 协议 | 8位值，标识数据包数据部分使用的协议 |
| H | 整数 | 2 | 头部校验和值 | 16位校验和值，用于错误检测 |
| 4s | 字符串 | 4 | 源IP地址 | 4字节源IP地址 |
| 4s | 字符串 | 4 | 目的IP地址 | 4字节目的IP地址 |

以下代码将解包IPv4头部，并将每个字段提取到变量中进行处理，同时将源和目的IP地址转换为人类可读的形式：

# unpack an IPv4 packet
# note the packet variable is a buffer returned from
# a socket.recvfrom() method
ipHeaderTuple = unpack('!BBHHHBBH4s4s', packet)
# Field Contents
verLen = ipHeaderTuple[0]
# Field 0: Version and Length
dscpECN = ipHeaderTuple[1]
# Field 1: DSCP and ECN
packetLength = ipHeaderTuple[2]
# Field 2: Packet Length
packetID = ipHeaderTuple[3]
# Field 3: Identification
flagFrag = ipHeaderTuple[4]
# Field 4: Flags/Frag Offset
timeToLive = ipHeaderTuple[5]
# Field 5: Time to Live (TTL)
protocol = ipHeaderTuple[6]
# Field 6: Protocol Number
checkSum = ipHeaderTuple[7]
# Field 7: Header Checksum
sourceIP = ipHeaderTuple[8]
# Field 8: Source IP
destIP = ipHeaderTuple[9]
# Field 9: Destination IP
# Convert the sourceIP and destIP into a
# standard dotted-quad string representation
# for example '192.168.0.5'
sourceAddress = socket.inet_ntoa(sourceIP);
destAddress = socket.inet_ntoa(destIP);
# Extract the version and header size, this will give
# us the offset to the data portion of the packet
version = verLen >> 4
# get upper nibble version
length = verLen & 0x0F
# get lower nibble header length
ipHdrLength = length * 4
# calculate the hdr size in bytes
fragOffset = flagFrag & 0x1FFF
# get lower 13 bits...
fragment = fragOffset * 8
# calculate start of fragment

2.4 解析TCP头部

可以使用相同的过程从数据包的数据部分（这里是TCP头部）提取字段。可以通过检查协议字段来确定数据包数据部分的类型。典型TCP头部和格式字符串“!HHLLBBHHH”以及 unpack() 函数可用于提取TCP头部的各个字段。

# By using the results of the IPv4 header unpacking, we can
# strip the TCP Header from the original packet
# Note the ipHdrLength is the offset from the beginning of
# the buffer. The standard length of a TCP packet is 20
# bytes. For our purposes these 20 bytes contain the
# the pertinent information we are looking for
stripTCPHeader = packet[ipHdrLength:ipHdrLength + 20]
# unpack returns a tuple, for illustration I will extract
# each individual values using the unpack() function
tcpHeaderBuffer = unpack('!HHLLBBHHH', stripTCPHeader)
sourcePort = tcpHeaderBuffer[0]
destinationPort = tcpHeaderBuffer[1]
sequenceNumber = tcpHeaderBuffer[2]
acknowledgement = tcpHeaderBuffer[3]
dataOffsetandReserve = tcpHeaderBuffer[4]
tcpHeaderLength = (dataOffsetandReserve >> 4) * 4
flags = tcpHeaderBuffer[5]
FIN = flags & 0x01
SYN = (flags >> 1) & 0x01
RST = (flags >> 2) & 0x01
PSH = (flags >> 3) & 0x01
ACK = (flags >> 4) & 0x01
URG = (flags >> 5) & 0x01
ECE = (flags >> 6) & 0x01
CWR = (flags >> 7) & 0x01
windowSize = tcpHeaderBuffer[6]
tcpChecksum = tcpHeaderBuffer[7]
urgentPointer = tcpHeaderBuffer[8]

3. Python静默网络映射工具（PSNMT）

3.1 工具目标

现在我们已经掌握了网络数据包嗅探的基础知识，接下来要解析数据并提取所需信息。此工具的目标如下：
1. 收集正在监控的网络上活动的IP地址（计划长时间放置监控，以捕获偶尔或不定期开启的网络设备）。
2. 收集与本地网络交互的远程计算机的IP地址（可能是Web、邮件或各种云服务）。
3. 收集本地和/或远程计算机使用的服务端口，特别关注“定义明确的端口”（0 - 1023）或“注册端口”（1024 - 49151）。
4. 只报告唯一的条目，即如果发现本地主机192.168.0.5正在使用主机端口80，只希望看到该唯一条目一次，而不是每次发现时都显示。
5. 为了限制程序的范围，只收集IPv4环境中的TCP或UDP数据包，该程序未来可轻松扩展以处理其他协议和IPv6。

3.2 需要提取的字段

为了满足上述要求，只需要从头部提取以下字段：
1. 协议
2. 源IP地址
3. 目的IP地址
4. 源端口
5. 目的端口

3.3 技术问题及解决方案

3.3.1 数据存储

使用简单的列表来保存从数据包中收集的数据，并为每个接收到的数据包将数据追加到列表中。

ipObservations = []

3.3.2 停止收集和时间限制

使用Python标准库 signal 模块，并将其集成到收集循环中。首先创建一个 myTimeout 类，当指定时间到期时，处理程序将引发该类。然后将 myTimeout 异常处理程序集成到接收数据包循环的 try/except 处理程序中。

class myTimeout(Exception):
    pass

def handler(signum, frame):
    print 'timeout received', signum
    raise myTimeout()

# Set the signal handler
signal.signal(signal.SIGALRM, handler)
# set the signal to expire in n seconds
signal.alarm(n)

try:
    while True:
        recvBuffer, addr = mySocket.recvfrom(65535)
        src, dst = decoder.PacketExtractor(recvBuffer, False)
        sourceIPObservations.append(src)
        destinationIPObservations.append(dst)
except myTimeout:
    pass

3.3.3 创建唯一条目

上述代码会记录每对源IP/端口和目的IP/端口，结果是一个未排序的列表，可能包含重复条目。为了解决这个问题，收集完成后，先将列表转换为集合，这将立即消除任何重复项（因为这是集合的基本属性），然后将集合转换回列表并对列表进行排序。

uniqueSrc = set(map(tuple, ipObservations))
finalList = list(uniqueSrc)
finalList.sort()

3.3.4 结果输出

为了提供一个可用的列表，程序将生成一个逗号分隔值（CSV）文件，该文件可以在工作表中进一步处理或检查。

3.4 PSNMT源代码结构

源代码分为以下五个源文件，每个文件都包含详细的注释，描述了程序的各个方面：
| 源文件 | 用途 |
| ---- | ---- |
| psnmt.py | 主程序设置和循环 |
| decoder.py | 原始数据包解码器 |
| _commandparser.py | 用户命令行解析器 |
| _csvHandler.py | 创建/写入CSV文件输出的处理程序 |
| _classLogging.py | 处理取证日志的类 |

3.5 psnmt.py源代码

#
# Python Passive Network Monitor and Mapping Tool
#
# Import Standard Library Modules
import socket
# network interface library used for raw sockets
import signal
# generation of interrupt signals i.e. timeout
import os
# operating system functions i.e. file I/o
import sys
# system level functions i.e. exit()
# Import application specific Modules
import decoder
# module to decode tcp and udp packets
import _commandParser
# parse out command line args
import _csvHandler
# output generation
from _classLogging import _ForensicLog
# Logging operations
# Process the Command Line Arguments
userArgs = _commandParser.ParseCommandLine()
# create a log object
logPath = os.path.join(userArgs.outPath, "ForensicLog.txt")
oLog = _ForensicLog(logPath)
oLog.writeLog("INFO", "PS-NMT Started")
csvPath = os.path.join(userArgs.outPath, "ps-nmtResults.csv")
oCSV = _csvHandler._CSVWriter(csvPath)
# Setup the protocol to capture
if userArgs.TCP:
    PROTOCOL = socket.IPPROTO_TCP
elif userArgs.UDP:
    PROTOCOL = socket.IPPROTO_UDP
else:
    print 'Capture protocol not selected'
    sys.exit()
# Setup whether output should be verbose
if userArgs.verbose:
    VERBOSE = True
else:
    VERBOSE = False
# Calculate capture duration
captureDuration = userArgs.minutes * 60
# Create timeout class to handle capture duration
class myTimeout(Exception):
    pass
# Create a signal handler that raises a timeout event
# when the capture duration is reached
def handler(signum, frame):
    print 'timeout received', signum
    raise myTimeout()
# Enable Promiscious Mode on the NIC
ret = os.system("ifconfig eth0 promisc")
if ret == 0:
    oLog.writeLog("INFO", 'Promiscious Mode Enabled')
    # create an INET, raw socket
    # AF_INET specifies ipv4
    # SOCK_RAW specifies a raw protocol at the network layer
    # IPPROTO_TCP or UDP Specifies the protocol to capture
    try:
        mySocket = socket.socket(socket.AF_INET, socket.SOCK_RAW, PROTOCOL)
        oLog.writeLog("INFO", 'Raw Socket Open')
    except:
        # if socket open fails
        oLog.writeLog("ERROR", 'Raw Socket Open Failed')
        del oLog
        if VERBOSE:
            print 'Error Opening Raw Socket'
        sys.exit()
# Set the signal handler to the duraton specified by the user
signal.signal(signal.SIGALRM, handler)
signal.alarm(captureDuration)
# create a list to hold the results from the packet capture
# I'm only interested in Protocol Source IP, Source Port, Destination IP, Destination Port
ipObservations = []
# Begin receiving packets until duration is received
# the inner while loop will execute until the timeout
try:
    while True:
        # attempt recieve (this call will wait)
        recvBuffer, addr = mySocket.recvfrom(255)
        # decode the received packet
        content = decoder.PacketExtractor(recvBuffer, VERBOSE)
        # append the results to our list
        ipObservations.append(content)
        # write details to the forensic log file
        oLog.writeLog('INFO', \
                      'RECV:' + content[0] + \
                      'SRC :' + content[1] + \
                      'DST :' + content[3])
except myTimeout:
    pass
# Once time has expired disable Promiscous Mode
ret = os.system("ifconfig eth0 -promisc")
oLog.writeLog("INFO", 'Promiscious Mode Diabled')
# Close the Raw Socket
mySocket.close()
oLog.writeLog("INFO", 'Raw Socket Closed')
# Create unique sorted list
uniqueSrc = set(map(tuple, ipObservations))
finalList = list(uniqueSrc)
finalList.sort()

通过以上步骤和代码，我们可以构建一个能够捕获网络数据包并提取信息以监控流量的应用程序。

4. PSNMT工作流程分析

4.1 整体流程

PSNMT 的整体工作流程可以用以下 mermaid 流程图表示：

graph TD;
    A[开始] --> B[解析命令行参数];
    B --> C[创建日志对象和 CSV 文件对象];
    C --> D[设置捕获协议];
    D --> E[设置输出是否详细];
    E --> F[计算捕获持续时间];
    F --> G[启用混杂模式];
    G --> H{混杂模式启用成功?};
    H -- 是 --> I[创建原始套接字];
    H -- 否 --> J[输出错误信息并退出];
    I --> K[设置信号处理和超时];
    K --> L[开始接收数据包];
    L --> M[解码数据包];
    M --> N[将结果添加到列表];
    N --> O[写入日志];
    O --> P{超时?};
    P -- 否 --> L;
    P -- 是 --> Q[禁用混杂模式];
    Q --> R[关闭原始套接字];
    R --> S[创建唯一排序列表];
    S --> T[结束];
    J --> T;

4.2 详细步骤解释

解析命令行参数 ：使用 _commandParser.ParseCommandLine() 函数解析用户输入的命令行参数。
创建日志对象和 CSV 文件对象 ：
- 创建日志对象 oLog ，用于记录程序运行过程中的信息。
- 创建 CSV 文件对象 oCSV ，用于输出最终的结果。
设置捕获协议 ：根据用户输入的参数，设置要捕获的协议（TCP 或 UDP）。
设置输出是否详细 ：根据用户输入的参数，设置程序输出是否详细。
计算捕获持续时间 ：将用户输入的分钟数转换为秒数，作为捕获持续时间。
启用混杂模式 ：使用 os.system("ifconfig eth0 promisc") 命令启用 NIC 的混杂模式。
创建原始套接字 ：如果混杂模式启用成功，使用 socket.socket(socket.AF_INET, socket.SOCK_RAW, PROTOCOL) 创建原始套接字。
设置信号处理和超时 ：使用 signal 模块设置信号处理函数和超时时间。
开始接收数据包 ：在循环中使用 mySocket.recvfrom(255) 接收数据包。
解码数据包 ：使用 decoder.PacketExtractor(recvBuffer, VERBOSE) 解码接收到的数据包。
将结果添加到列表 ：将解码后的结果添加到 ipObservations 列表中。
写入日志 ：将接收到的数据包信息写入日志文件。
超时处理 ：当超时时间到达时，捕获 myTimeout 异常，停止接收数据包。
禁用混杂模式 ：使用 os.system("ifconfig eth0 -promisc") 命令禁用 NIC 的混杂模式。
关闭原始套接字 ：调用 mySocket.close() 关闭原始套接字。
创建唯一排序列表 ：将 ipObservations 列表转换为集合，消除重复项，然后再转换回列表并排序。

5. 总结与应用场景

5.1 总结

本文详细介绍了基于 Python 的网络数据包捕获与分析的相关知识和技术，包括：
1. 网络端口镜像与嗅探基础，以及混杂模式的概念和设置方法。
2. Linux 下 Python 原始套接字的创建和使用，以及如何解包缓冲区和解析 IPv4、TCP 头部。
3. Python 静默网络映射工具（PSNMT）的目标、需要提取的字段、技术问题及解决方案，以及源代码结构和详细实现。

5.2 应用场景

PSNMT 可以应用于以下场景：
1. 网络监控 ：长时间监控网络活动，收集活动的 IP 地址和使用的服务端口，帮助管理员了解网络使用情况。
2. 安全审计 ：检测与本地网络交互的远程计算机的 IP 地址，发现潜在的安全威胁。
3. 流量分析 ：分析网络流量，了解不同服务的使用情况，优化网络资源分配。

通过掌握这些技术，我们可以构建强大的网络监控和分析工具，更好地管理和保护网络。