(1)爬取网页代码格式问题 def get_html(url): try: response = requests.get(url) if response.status_code == 200: return response.text ...... return get_html(url) def get_index(keyword,page): ...... html = get_html(url) print(html.decode('utf_8'))
在获取响应的返回值后加
return response.text.encode('utf-8')
将相同默认编码转为utf-8
(2)如果是以文件形式保存的网页代码,则在打开文件时加上:
def get_html(): with open('principle_test.txt', "r",encoding='utf-8') as f: # 设置文件对象 html = f.read() f.close() return html def get_index: ...... html = html print(html.decode('utf_8'))