闲暇之余有玩《率土之滨》这个游戏,感觉还不错,想做个对战模拟器,查查官网有啥数据可以用发现只有基本的武将数据可用,而且还没有武将的成长数据。算了能爬啥就爬啥数据。。。
以下是代码,新手玩python请多多指教,python版本是2.7
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup # For processing HTML
import urllib2
import sys
import re
reload(sys)
sys.setdefaultencoding('utf-8')
class heroInfo:
def __init__(self):
self.heroName = ''
self.heroCost = ''
self.herobingzhong=''
self.herojuli = ''
self.heromoulue = ''
self.herogongji = ''
self.herogongcheng = ''
self.herofangyu = ''
self.herosudu = ''
self.herojineng=''
self.heroother=''
for i in range(646):
temp = ''
if i+1 < 10:
temp = "00"
if i+1 < 100 and i+1 >= 10:
temp = "0"
if i+1 >= 100:
temp = ""
url = temp+str(i+1)
#print url
r=''
try:
page = urllib2.urlopen("http://stzb.163.com/herolist/100"+url+".html")
r = page.read()
r = r.decode('gbk')
except urllib2.URLError, err:
print err
continue
soup = BeautifulSoup(r)
content = soup.find(name='div',attrs={'class':'role-content'})
heroName = content.h1.text
herolist=[]
hinfo = heroInfo()
hinfo.heroName = heroName
herolist.append(hinfo)
nextsoup=BeautifulSoup(str(content))
grouplist=nextsoup.findAll(name='dl',attrs={'class':'group'})
i=0
for item in grouplist:
if i==0:
hinfo.herojineng= item.dd.text
else:
hinfo.heroother= item.dd.text
#print item.dd.text
i=i+1
spanlist=nextsoup.findAll('span')
for item in spanlist:
if 'cost' in item.text:
hinfo.heroCost = item.text
#print item.text
if '兵种' in item.text:
hinfo.herobingzhong=item.text
#print item.text
if '攻击距离' in item.text:
hinfo.herojuli=item.text
if '谋略' in item.text:
hinfo.heromoulue=item.text
if '初始攻击' in item.text:
hinfo.herogongji=item.text
if '初始攻城' in item.text:
hinfo.herogongcheng=item.text
if '防御' in item.text:
hinfo.herofangyu=item.text
if '速度' in item.text:
hinfo.herosudu=item.text
#print item.text
print hinfo.heroName+','+hinfo.herobingzhong+','+hinfo.heroCost+','+hinfo.herojineng+','+hinfo.herogongji+','+hinfo.heromoulue+','+hinfo.herosudu+','+hinfo.herogongcheng+','+hinfo.herojineng+','+hinfo.heroother
爬下来的数据稍有瑕疵,因网易貌似某些武将数据删除了网页实际武将只有大约430个左右。理论上装了BeautifulSoup就能直接运行,喜欢的可以拿去一试