爬取知乎源代码
import requests
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0'}
url = 'https://www.zhihu.com/explore'#知乎网页
html = requests.get(url,headers=headers).text
print(html)
报错:UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u2022’ in position 12440: illegal multibyte sequence
原因:python默认编码的局限性,print()并不能完全打印所有的Unicode字符。
解决方法:把Python的默认编码改为gb18030
import io
import sys
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030')
转载自:https://blog.youkuaiyun.com/qq_28359387/article/details/54974578