如何在百度下载图片?

最新推荐文章于 2024-10-13 18:19:53 发布

lcbwlx

最新推荐文章于 2024-10-13 18:19:53 发布

阅读量900

点赞数

CC 4.0 BY-SA版权

分类专栏：网页

本文链接：https://blog.youkuaiyun.com/lcbwlx/article/details/48880133

网页专栏收录该内容

7 篇文章

订阅专栏

最近要在百度上下载图片座测试, 不想手动下载, 因此研究了一下自动下载脚本.

成果如下:

# -*- coding: utf-8 -*-



import os
import urllib2
import json



tags = ['运动服']

urls = [];

savePath = './'

for tag2 in tags:
    print 'start download theme ：' , tag2
    
    startNum = 0 ;  # the index of the start image to download
    resultNum = 60  # the number of images one time can be got form baidu image by json , 60 is the upper bound 
    
    endnum = 3000

    totalNum = -1  # the total number of the theme images 
    downloadNum = 0
    
    path = unicode(savePath + '/' + tag2 + '/' , 'utf8')
    if not os.path.exists(path):
        os.makedirs(path)
    
    
    while totalNum == -1 or startNum < totalNum or startNum > endnum:
        
        oneRequeseNum = 0
        
        try:
            
            url = 'http://image.baidu.com/i?tn=baiduimagejson&width=&height=&ie=utf8&oe=utf-8&word=' + tag2 + '&pn=' + str(startNum) + '&rn=' + str(resultNum)
            
            user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
            headers = {"User-Agent" : user_agent}
            req = urllib2.Request(url , headers=headers)
            html = urllib2.urlopen(req , timeout=100)
            
            jsonData = json.loads(html.read())
            
            # print jsonData
            if totalNum == -1:
                totalNum = jsonData['displayNum']
                print 'toatl number :', totalNum
            
            data = jsonData['data']

            for index , item in enumerate(data):
                
                oneRequeseNum += 1
                
                if item.has_key("objURL"):
                    url = item['objURL']
                    urls.append(url);
        
        except Exception , e:
            print "Exception : " , str(e)
            print url
            oneRequeseNum = oneRequeseNum+100
        
        finally:
            startNum = startNum + oneRequeseNum    
            print 'Finish download theme : ' , tag2 
            print 'Download images number :' , startNum
        
        ff = open('urls.txt','w');
        for url in urls:
            ff.write('%s\n'% url)
        ff.close()

这里有个注意的地方: url中的utf8等关键字需要加载在str之前. 如果加载再之后, 我的程序报错.

参考:

http://blog.youkuaiyun.com/yuanwofei/article/details/16343743

http://www.devba.com/index.php/archives/3321.html

http://blog.youkuaiyun.com/viomag/article/details/38340993

以及原本代码是https://github.com/busz/BaiduImageDownloader