Python爬取网站数据后,数据的保存方式是大家比较关心的意一件事情,也是为接下来是否能够更简便的处理数据的关键步骤。下面,就Python爬取网页数据后的保存格式进行简单介绍。三种保存格式为txt格式、CSV格式和数据库格式。
首先,保存为txt格式。话不多说,直接上代码!
# -*- coding: utf-8 -*-
import requests
import json
import html
import urllib
import sys
import re
import random
import time
from threading import Timer
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf-8')
headers ={'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 BIDUBrowser/8.7 Safari/537.36'}
def get_html1(i):
url = 'https://www.ppmoney.com/StepUp/List/-1/{}/fixedterm/true/false?_={}'
html = requests.get(url.format(i,random.randint(1501050773102,1501051774102)),headers=headers)
return html.content
def get_data1(html):
data1 = json.loads(html)
data = data1['PackageList']['Data']
for i in d