I've got two binary files. They look something like this, but the data is more random:
File A:
FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF ...
File B:
41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37 ...
What I'd like is to call something like:
>>> someDiffLib.diff(file_a_data, file_b_data)
And receive something like:
[Match(pos=4, length=4)]
Indicating that in both files the bytes at position 4 are the same for 4 bytes. The sequence 44 43 42 41 would not match because they're not in the same positions in each file.
Is there a library that will do the diff for me? Or should I just write the loops to do the comparison?
解决方案
You can use itertools.groupby() for this, here is an example:
from itertools import groupby
# this just sets up some byte strings to use, Python 2.x version is below
# instead of this you would use f1 = open('some_file', 'rb').read()
f1 = bytes(int(b, 16) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split())
f2 = bytes(int(b, 16) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())
matches = []
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i]):
if k:
pos = next(g)
length = len(list(g)) + 1
matches.append((pos, length))
Or the same thing as above using a list comprehension:
matches = [(next(g), len(list(g))+1)
for k, g in groupby(range(min(len(f1), len(f2))), key=lambda i: f1[i] == f2[i])
if k]
Here is the setup for the example if you are using Python 2.x:
f1 = ''.join(chr(int(b, 16)) for b in 'FF FF FF FF 00 00 00 00 FF FF 44 43 42 41 FF FF'.split())
f2 = ''.join(chr(int(b, 16)) for b in '41 42 43 44 00 00 00 00 44 43 42 41 40 39 38 37'.split())
博客围绕比较两个二进制文件差异展开。给出了两个二进制文件示例,询问是否有库可实现差异比较,还是需手动写循环。并给出解决方案,使用itertools.groupby()函数,还分别给出Python 3.x和2.x版本的代码示例。
487

被折叠的 条评论
为什么被折叠?



