decode uigs中的查询词

最新推荐文章于 2024-01-03 15:50:06 发布

最新推荐文章于 2024-01-03 15:50:06 发布 · 266 阅读

文章标签：

#Python #F#

Python 专栏收录该内容

3 篇文章

订阅专栏

本文介绍了一种用于处理URL中特殊字符的Python函数decodeQuery。该函数能够将URL中的%u编码的Unicode字符转换为对应的GBK编码字符，并处理x转义序列。通过使用正则表达式和标准库函数，该函数实现了对复杂查询字符串的有效解码。


# coding=gbk
import urllib
import re

def decodeQuery(query):
	lowStr = query.lower()

	if lowStr.find("%u")!=-1:
		t = re.compile(r"%u([0-9a-f]{2})([0-9a-f]{2})",re.IGNORECASE)
		matchList = t.findall(lowStr)
		for match in matchList:
			hex = match[0]+match[1]
			chr = hex.decode("hex")
			try:	
				dechr = chr.decode("utf-16be").encode("gbk")
				lowStr = lowStr.replace("%u"+hex,dechr)
			except:
				lowStr = lowStr.replace("%u"+hex,"?")
	else:
		lowStr = lowStr.replace("\\x","%")

	try:
		deQuery = urllib.unquote(lowStr)
	except:
		deQuery = lowStr 

	return deQuery