《2018年5月31日》【连续233天】
标题:股票数据定向爬虫;
内容:
1.东方财富网:http://quote.eastmoney.com/stocklist.html
2.百度股票:https://gupiao.baidu.com/stock/
步骤一:从东方财富网获取股票列表;
步骤二:根据股票列表逐步到百度股票获得个股信息;
步骤三:将结果储存到文件;
代码:
import requests
from bs4 import BeautifulSoup
import traceback
import re
def getHTMLText(url,code='utf-8'):
try:
r =requests.get(url,timeout = 30)
r.raise_for_status()
r.encoding=code
return r.text
except:
return ""
def getStockList(lst,stockURL):
html =getHTMLText(stockURL,'GB2312')
soup =BeautifulSoup(html,'html.parser')
a =soup.find_all('a')
for i in a:
try:
href =i.attrs['href']
lst.append(re.findall(r"[s][hz]\d{6}",href)[0])
except:
continue
def getStockInfo(lst,stockURL, fpath):
count=0
for stock in lst:
url =stockURL + stock + ".html"
html=getHTMLText(url)
try:
if html =="":
continue
infoDict ={}
soup =BeautifulSoup(html,'html.parser')
stockInfo =soup.find('div',attrs={'class':'stock-bets'})
name =stockInfo.find_all(attrs={'class':'bets-name'})[0]
infoDict.update({'股票名称':name.text.split()[0]})
keyList =stockInfo.find_all('dt')
valueList =stockInfo.find_all('dd')
for i in range(len(keyList)):
key =keyList[i].text
val =valueList[i].text
infoDict[key] =val
with open(fpath,'a',encoding='utf-8') as f:
f.write(str(infoDict) + '\n')
count +=1
print('\r当前速度:{:.2f}%'.format(count*100/len(lst)),end="")
except:
count+=1
print('\r当前速度:{:.2f}%'.format(count*100/len(lst)),end="")
continue
def main():
stock_list_url ='http://quote.eastmoney.com/stocklist.html'
stock_info_url ='https://gupiao.baidu.com/stock/'
output_file ='D://BaiduStcokInfo.txt'
slist=[]
getStockList(slist,stock_list_url)
getStockInfo(slist,stock_info_url,output_file)
main()
效果:

还在跑。
本文介绍了一种利用Python从东方财富网获取股票列表,并通过百度股票API抓取详细股票信息的方法。该爬虫分为三个步骤:获取股票列表、抓取个股信息及存储数据。
249

被折叠的 条评论
为什么被折叠?



