学python没多久,主要想用它来做爬虫,写api建议用node.js,做全站页面渲染用php搞定,做爬虫还得看python:
这里没有用python的一些爬虫框架,先采用python内置模块urllib直接处理页面抓取,然后解析内容然后直接下载图片:
直接抓取豆瓣图片api,解析并下载图片:
# -*- coding: utf-8 -*-
import json
import urllib
import re
def getHtml(url):
request =url
response = urllib.urlopen(request)
return response.read()
def downloadPic(url,start):
source =getHtml(url)
s = json.loads(source)
imgArr = s['subjects']
index=0
for i in imgArr:
#print i['title'],i['url']
ext=re.findall(r'.*\.(\w+)$',i['cover'])
if len(ext)>0:
ext =ext[0]
else:
ext='jpg'
path='./img/douban_%s_%s.%s' % (start,index,ext)
print path
f=open(path,'w')
f.write