python代码结构分析_用Python解析C结构

博主尝试在Python中读取并解析包含多个WIN32_FIND_DATAW结构体的二进制文件,遇到了字符编码问题和访问数组元素的错误。通过使用ctypes和struct模块,解决了跨平台的字节序和编码问题,成功解析了结构体中的cFileName字段。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I'm sure this is terribly wrong, and I'm having a couple of problems. I've written out an array of WIN32_FIND_DATAW structures to disk, one after another, and I'd like to consume and parse them in my Python script.

The code I'm currently using is:

>>> fp = open('findData', 'r').read()

>>> data = ctypes.cast(fp, ctypes.POINTER(wintypes.WIN32_FIND_DATAW))

>>> print str(data[0].cFileName)

The first problem is that the third line doesn't print a nice string like I would expect. Instead of printing $Recycle.Bin it prints UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

This is the result of just printing the data stored there:

>>> data[0].cFileName

u'\U00520024\U00630065\U00630079\U0065006c\U0042002e\U006e0069'

This looks relatively reasonable. $ is ASCII 0x24, R is ASCII 0x52 and so on.

So why can't I print it like a string?

My second question is that doing:

>>> data[1].cFileName

Gives me ridiculous data. I'm fairly sure I'm not using that ctypes.cast correctly. How should I be doing it to access these? To clarify, in C, I'd just point a PWIN32_FIND_DATAW pointer to the beginning of the buffer and access the individual structs in the array using similar code, and I'm trying to do the same in Python.

Update

Doing:

>>> data[0].cFileName.encode('windows-1252')

Yields this error:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to

Update

The beginning of the first entry (data[0] up to the first part of cFileName) looks like the following:

user@ubuntu:~/data$ hexdump -C findData | head -n 6

00000000 16 00 00 00 dc 5a 9f d2 31 04 ca 01 ba 81 89 1a |.....Z..1.......|

00000010 81 e2 cd 01 ba 81 89 1a 81 e2 cd 01 00 00 00 00 |................|

00000020 00 00 00 00 00 00 00 00 00 00 00 00 24 00 52 00 |............$.R.|

00000030 65 00 63 00 79 00 63 00 6c 00 65 00 2e 00 42 00 |e.c.y.c.l.e...B.|

00000040 69 00 6e 00 00 00 00 00 00 00 00 00 00 00 00 00 |i.n.............|

00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

I can post more data if needed.

解决方案

As already mentioned in the comments, this is due to differences between windows and linux. The ctypes module tries to fit into the local environment, hence the mismatch. The best solution is to use the struct module to handle it in a platform independent manner. The following code shows how this can be done for a single record.

# Setup test data based on incomplete sample

bytes = "\x16\x00\x00\x00\xdc\x5a\x9f\xd2\x31\x04\xca\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\xba\x81\x89\x1a\x81\xe2\xcd\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x24\x00\x52\x00\x65\x00\x63\x00\x79\x00\x63\x00\x6c\x00\x65\x00\x2e\x00\x42\x00\x69\x00\x6e\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"

bytes = bytes + "\x00"*(592-len(bytes))

import struct

import codecs

# typedef struct _WIN32_FIND_DATA {

# DWORD dwFileAttributes;

# FILETIME ftCreationTime;

# FILETIME ftLastAccessTime;

# FILETIME ftLastWriteTime;

# DWORD nFileSizeHigh;

# DWORD nFileSizeLow;

# DWORD dwReserved0;

# DWORD dwReserved1;

# TCHAR cFileName[MAX_PATH];

# TCHAR cAlternateFileName[14];

fmt = "

attrs, creation, access, write, sizeHigh, sizeLow, reserved0, reserved1, name, alternateName = struct.unpack(fmt, bytes)

name = codecs.utf_16_le_decode(name)[0].strip('\x00')

alternateName = codecs.utf_16_le_decode(alternateName)[0].strip('\x00')

print name

NOTE: This assumes that the size of MAX_PATH is 260 (which should be true, but you never know).

To read all values from the file you need to read blocks of 592 bytes at a time and then decode it as above.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值