I'm receiving STX ETX packet data, here's a sample:

The data has been URL encoded. Before it is encoded and sent it is like this:

The relationship between the URL encoded data and the byte data before it is encoded and sent is this.
0x41 -> A
0xd9 -> %D9
0x33 -> 3
0x48 -> H
0x58 -> X
0x01 -> %01
0x00 -> %00
After some research I have found that this is unicode code points being converted into hexidecimal numbers and unicode character names. With the exception of the first byte which is an ascii character.
After the first character A, the following four bytes make up a 4 byte integer which is a UTC timestamp.
question
How do i convert the URL back into hexidecimal and unicode code points using python. I've looked at the unicodedata module but can't seem to find a conversion from unicode character names to unicode code points.
Any help or suggestions would be much appreciated.
解决方案
You can use the urlparse module to decode that string.
import urlparse
data = "/type=stxetx&packet=A%d93HX%01%00&serial=1234&foo=bar"
new_data = dict(urlparse.parse_qsl(data))
assert len(new_data['packet']) == 7
assert new_data['packet'][0] == 'A'
assert ord(new_data['packet'][1]) == 0xd9
Reference:
本文介绍如何使用Python解析URL编码的字符串,特别关注如何将包含UTC时间戳的URL解码回原始的十六进制和Unicode代码点。通过urlparse模块和理解Unicode编码机制,作者提供了从编码数据到字符名称的转换方法。
925

被折叠的 条评论
为什么被折叠?



