一个小脚本遇到的问题之python查看网站编码

解决Python编码问题

最新推荐文章于 2025-07-14 08:13:52 发布

原创最新推荐文章于 2025-07-14 08:13:52 发布 · 221 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python #网页编码

Python 专栏收录该内容

9 篇文章

订阅专栏

本文讨论了Python编码问题，特别关注2.7.10版本与3.0版本之间的差异，并提供了一个查看网站编码的方法，包括使用chardet模块进行内容分析。

[size=11]
[url=http://lovekaiyuan.iteye.com/admin/blogs/2286546]那个脚本的样子[/url]
[color=red]第一个问题[/color]是编码问题，这里是以2.7.10为例，3.0以后的编码有很大的变化。
不过这里因为正好躲过，所以没有多做研究。不过大家可以参考[url="http://www.pythontab.com/html/2013/pythonhexinbiancheng_0114/129.html"][color=red]这里[/color][/url]。介绍了一个查看网站编码的方法。个人没有验证，等用到的时候在做详细的记录。里面介绍了两个方法，摘录靠谱的那个如下：
[/size]


# 原文网址：http://www.pythontab.com/html/2013/pythonhexinbiancheng_0114/129.html
#如果你的python没有安装chardet模块，你需要首先安装一下chardet判断编码的模块

import chardet
import urllib
#先获取网页内容
data = urllib.urlopen('http://www.pythontab.com').read()
#用chardet进行内容分析
chardit = chardet.detect(data)

data1 = urllib.urlopen('http://www.baidu.com').read()

chardit1 = chardet.detect(data1)

print chardit['encoding'] # pythontab

print chardit1['encoding'] # baidu