python 爬取表情包——斗图啦

本文介绍了一种使用Python和requests库抓取网页图片的方法,通过解析HTML获取图片链接,并下载保存到本地。示例中展示了如何从逗图啦网站抓取表情包,包括设置请求头、解析页面、下载图片等步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

#import urllib
import requests
import time
from lxml import etree
url='http://www.doutula.com/'
headers={'Referer':'http://www.doutula.com/',
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER'}

resp=requests.get(url,headers=headers)
print(resp.text)
'''
<img class="gif" style="min-height: inherit;left: 5px;top:5px" src="//static.doutula.com/img/gif.png" />
<img src="//static.doutula.com/img/loader_170_160.png" 
style="margin: 0 auto; min-height: inherit;"
 data-original="https://ws2.sinaimg.cn/bmiddle/6af89bc8gw1f8smgrjzkug20af0afmyl.gif"
 alt="总爱在我的生活里指手画脚,俗称经验婊和过来人婊" class="img-responsive lazy image_dta"
 data-backup="http://img.doutula.com/production/uploads/image//2016/06/10/20160610526577_IvENsd.gif!dta">
 '''

#开始解析

#html=etree.HTML(resp.text)
#srcs=html.xpath('.//img/@data-original')
#for src in srcs:
#    filename=src.split('/')[-1]
#    img=requests.get(src,headers=headers)
#    
#    with open('D:\Anaconda3\imgs/'+filename,'wb') as file:
#        file.write(img.content)  
#    print(src,filename)
#    
#print(len(src))



def download_img(src):
    filename=src.split('/')[-1]
    img=requests.get(src,headers=headers)
    with open('D:\Anaconda3\imgs/'+filename,'wb') as file:
        file.write(img.content)  
    print(src,filename)



def get_page(url):
    resp=requests.get(url,headers=headers)
    print(resp,url)
    html=etree.HTML(resp.text)
    srcs=html.xpath('.//img/@data-original')
    for src in srcs:
        download_img(src)
        
    next_link=html.xpath('.//a[@rel="next"]/@href')
    return ['next_link']


next_link_base='http://www.doutula.com/article/list/?page='
next_link=html.xpath('.//a[@rel="next"]/@href')
current_num=1
while next_link:
    time.sleep(0.2)
    current_num+=1
    next_link=get_page(next_link_base+str(current_num))
    if current_num>=4:
        break
        
        

'''
http://www.doutula.com/article/list/?page=581
'''

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值