繁体中文和简体中文编码
中文编码遇到问题
内存(Hex) | |
---|---|
43 00 3a 00 5c 00 55 00 73 00 65 00 72 00 73 00 5c 00 | |
6c 00 69 00 75 00 5c 00 44 00 65 00 73 00 6b 00 74 00 | C.:..U.s.e.r.s..l.i.u..D.e.s.k.t. |
6f 00 70 00 5c 00 1a 90 53 90 2c 6e 66 8a 31 00 32 00 | |
33 00 5c 00 72 00 65 00 73 00 2e 00 6a 00 70 00 67 00 | o.p…?S?,nf?1.2.3…r.e.s…j.p.g. |
// 程序获取 获取的是简体中文
内存(Hex) | |
---|---|
43 00 3a 00 5c 00 55 00 73 00 65 00 72 00 73 00 5c 00 | |
6c 00 69 00 75 00 5c 00 44 00 65 00 73 00 6b 00 74 00 | C.:..U.s.e.r.s..l.i.u..D.e.s.k.t. |
6f 00 70 00 5c 00 1a 90 53 90 4b 6d d5 8b 31 00 32 00 | |
33 00 5c 00 72 00 65 00 73 00 2e 00 6a 00 70 00 67 00 | o.p…?S?Km??1.2.3…r.e.s…j.p.g. |
PS:最好统一输入,统一输入简体中文或者统一输入繁体中文,如果一般输入简体一半输入繁体,用window系统函数转换会出问题。
编码判断
百度api识别结果包含其他国文字(如:\uc601),window下python写txt,默认转gbk,如韩文转gbk,所以出现encode失败,需要过滤筛选
#中文Unicode编码判断,范围\u4e00~\u9fa5
def containCHN(str):
for ch in str:
if not '\u4e00' <= ch <= '\u9fa5':
return False
return True