最近调试保存博客页面的时候,遇到下面的问题:
flying-bird@flyingbird:~/Downloads/export_blog$ ./images_parser.py 2015-07-29-2/Windows平台下面的MD5算法.htm
Traceback (most recent call last):
File "./images_parser.py", line 154, in <module>
_test(sys.argv[1])
File "./images_parser.py", line 146, in _test
get_image_items(content)
File "./images_parser.py", line 133, in get_image_items
parser.feed(content)
File "/usr/lib/python2.7/HTMLParser.py", line 117, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 161, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/HTMLParser.py", line 308, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "/usr/lib/python2.7/HTMLParser.py", line 475, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 7: ordinal not in range(128)
解决办法参考http://blog.sina.com.cn/s/blog_6c39196501013s5b.html
主要如下:
在出现问题的页加上如下三行即可:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')