某日,在windows下写好代码,准备上个网休息一把,就点击右下角“显示桌面”按钮,突然间觉得桌面背景很久没换了,太难看了,于是乎在网上搜罗各大墙纸的网站,国内的网站直接略过,做得太没有水准了,终于搜到了一家相当好的网站,在此推荐一把:
此站不但图片精美,而且提供图片分类,图片大小分类等,用户可以根据自己的喜好下载不同种类的图片,也可以根据发布日期去查看以前图片,实在是不可多得的一个好站啊。
不过此站唯一不足的就是,图片要一张一张的预览并下载,实在是不高效,加之网速缓慢根本无法尽情欣赏图片,回想起之前做过一个下载地址链接的小程序,我决定针对这个网站做一个链接抓取的小程序,用来获得网页上指定日期的图片,然后由后台程序批量下载,等到下一波代码写完,我就可以查看图片啦。
分析过程, 此处省略见一万字(今天太困了,未完待续)
#!/usr/bin/python3.2
from html.parser import HTMLParser
import urllib.request
#http://www.goodwp.com/pic/201201/1920x1080/goodwp.com-20912.jpg
#http://www.goodwp.com/mini/201201/20946.jpg
sizebase = '1920x1080'
class MyHTMLParser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.links=[]
self.imglinks=[]
self.imgcatagory=[] #add the catagory to classify the the wallpapers
self.catagory = 0
self.imgsizes=[]
self.size = 0
self.nav = 2
self.nextpage =""
def handle_starttag(self, tag, attrs):
if tag == "a":
if len(attrs) == 0 : pass
else:
for (variable, value) in attrs:
if variable == "href" and self.nav==3:
self.nextpage = value
self.nav = 0
elif tag == "img":
if len(attrs) == 0: pass
else:
for (variable,value) in attrs:
if variable == "width" and value == "240":
for (t,v) in attrs:
if t=="src":
self.imglinks.append(v)
pass
pass
elif tag == "div":
if len(attrs) == 0: pass
else:
for (variable,value) in attrs:
if variable == "class" and value =="razmer":
self.size = 1
elif variable == "class" and value == "catcol":
self.catagory = 1
elif tag == "span":
if len(attrs) == 0:
if self.nav == 1:
self.nav = 3
pass
else:
for(t,v) in attrs:
if t=="class" and v =="nav-center" and self.nav == 2:
self.nav=1
def handle_data(self,data):
if self.size == 1:
self.imgsizes.append(data)
self.size = 0
elif self.catagory == 1:
self.imgcatagory.append(data.replace(' ','_'));
self.catagory = 0
def compose_addr(self):
if len(self.imglinks) > 0:
i = 0
for i in range(len(self.imglinks)):
tmp2 = ""
tmpstr = self.imglinks[i].replace("mini","pic")
slashindex = str(self.imglinks[i]).rfind('/');
tmp = self.imglinks[i][0:slashindex+1]
if self.imgsizes[i] > sizebase:
tmp2 = tmp +sizebase+ '/' + 'goodwp.com-'
tmp = tmp+ self.imgsizes[i]+'/' + 'goodwp.com-'
tmp = tmp + self.imglinks[i][slashindex+1 : len(self.imglinks[i])]
self.links.append( " -P " +dir+'/' + self.imgcatagory[i] +" "+tmp.replace("mini","pic"))
if tmp2 != "":
tmp2 = tmp2 + self.imglinks[i][slashindex+1 : len(self.imglinks[i])]
self.links.append( " -P " +dir+'/' +self.imgcatagory[i] +" "+tmp2.replace("mini","pic"))
def get_all_picaddr(startaddr = 'http://www.goodwp.com/2014/08/'):
#print("startaddr----------", startaddr)
response = urllib.request.urlopen(startaddr)
htmlbytes = response.read()
htmlstr = htmlbytes.decode(encoding='windows-1251')
hp = MyHTMLParser()
hp.feed(htmlstr)
hp.compose_addr()
hp.close()
for val in hp.links:
print("wget "+val)
#print("imgsizes", hp.imgsizes)
return hp.nextpage
year = 2014
month = 1
dir
if __name__ == "__main__":
baseAddr = "http://www.goodwp.com/"
while( month < 9):
dir = ("%4d" % year)+"/"+ ('%02d' % month)
urlAddr=baseAddr+dir
nextpage = get_all_picaddr();
while nextpage != "":
nextpage = get_all_picaddr(nextpage)
month += 1
程序的主要功能是,遍历从一月份到今年八月份的所图片,对每个图片生成(两个,一个最辨率,一个1920*1080)对应的下载地址,并输出。
运行结果:
wget -P 2014/01/Aircrafts_and_Planes http://www.goodwp.com/pic/201408/2880x1800/goodwp.com-31765.jpg
wget -P 2014/01/Aircrafts_and_Planes http://www.goodwp.com/pic/201408/1920x1080/goodwp.com-31765.jpg
wget -P 2014/01/Animals http://www.goodwp.com/pic/201408/1920x1200/goodwp.com-31750.jpg
wget -P 2014/01/Animals http://www.goodwp.com/pic/201408/1920x1080/goodwp.com-31750.jpg
wget -P 2014/01/Animals http://www.goodwp.com/pic/201408/2880x1800/goodwp.com-31752.jpg
wget -P 2014/01/Animals http://www.goodwp.com/pic/201408/1920x1080/goodwp.com-31752.jpg
。。。。。。。。。。。
将运行结果输入到某一特定文件中,我们就可以以批处理的方式来下载图片啦。
PS. 我并不熟悉python,这只能作为一般的练习之用,欢迎大家批抨指正共同学习。