爬取网站:http://datachart.500.com/dlt/history/history.shtml —500彩票网 (分析后发现网站源代码并非是通过页面跳转来查找不同的数据,故可通过F12查找network栏找到真正储存所有历史开奖结果的网页)
如图:

爬虫部分:
from bs4 import BeautifulSoup #引用BeautifulSoup库
import requests #引用requests
import os #os
import pandas as pd
import csv
import codecs
lst=[]
url='http://datachart.500.com/dlt/history/newinc/history.php?start=07001&end=21018'
r = requests.get(url)
r.encoding='utf-8'
text=r.text
soup = BeautifulSoup(text, "html.parser")
tbody=soup.find('tbody',id="tdata")
tr=tbody.find_all('tr')
td=tr[0].find_all('td')
for page in range(0,14016):
td=tr[page].find_all('td')
lst.append([td[0].text,td[1].text,td[2].text,td[3].text,td[4].text,td[5].text,td[6].text,td[7].text])
with open("Lottery_data.csv",'w') as csvfile:
writer = csv.writer(csvfile)

最低0.47元/天 解锁文章
1826

被折叠的 条评论
为什么被折叠?



