python url解码,在python中解码URL编码的字节流数据

最新推荐文章于 2025-02-05 18:01:53 发布

转载最新推荐文章于 2025-02-05 18:01:53 发布 · 239 阅读

文章标签：

#python url解码

本文介绍如何使用Python解析URL编码的字符串，特别关注如何将包含UTC时间戳的URL解码回原始的十六进制和Unicode代码点。通过urlparse模块和理解Unicode编码机制，作者提供了从编码数据到字符名称的转换方法。

I'm receiving STX ETX packet data, here's a sample:

The data has been URL encoded. Before it is encoded and sent it is like this:

The relationship between the URL encoded data and the byte data before it is encoded and sent is this.

0x41 -> A

0xd9 -> %D9

0x33 -> 3

0x48 -> H

0x58 -> X

0x01 -> %01

0x00 -> %00

After some research I have found that this is unicode code points being converted into hexidecimal numbers and unicode character names. With the exception of the first byte which is an ascii character.

After the first character A, the following four bytes make up a 4 byte integer which is a UTC timestamp.

question

How do i convert the URL back into hexidecimal and unicode code points using python. I've looked at the unicodedata module but can't seem to find a conversion from unicode character names to unicode code points.

Any help or suggestions would be much appreciated.

解决方案

You can use the urlparse module to decode that string.

import urlparse

data = "/type=stxetx&packet=A%d93HX%01%00&serial=1234&foo=bar"

new_data = dict(urlparse.parse_qsl(data))

assert len(new_data['packet']) == 7

assert new_data['packet'][0] == 'A'

assert ord(new_data['packet'][1]) == 0xd9

Reference: