一个爬取网络图片的小爬虫（含代码）（requests，bs4）

最新推荐文章于 2025-03-21 10:15:26 发布

@little杰

最新推荐文章于 2025-03-21 10:15:26 发布

阅读量2.2k

点赞数 1

文章标签： python 爬虫数据挖掘

本文链接：https://blog.youkuaiyun.com/m0_50553633/article/details/126548039

版权

爬取高清的电脑壁纸（requests，bs4）

爬取优美图库的壁纸，本次的爬虫我分为三个大步骤：
1、拿到主页面的源代码，然后提取到子页面的链接href
2、通过href拿到子页面的内容，从子页面中找到图片的下载地址img->src
3、下载图片
第一部分：查看网页的源代码，然后提取子页面的链接href
在这里插入图片描述
找到子页面的链接的位置
第一部分主要代码如下：

url = "https://www.umei.cc/bizhitupian/weimeibizhi/"
resp = requests.get(url)
resp.encoding = "utf-8"#处理乱码

#将源代码交给beautifulsoup
main_page = BeautifulSoup(resp.text,"html.parser")
alist = main_page.find("ul",attrs={"class":"pic-list after"}).find_all("a")
# print(alist)
for a in alist:
    href = "https://www.umei.cc"+a.get("href")#直接通过get就可以拿到属性值
    print(href)

第二部分：通过href拿到子页面的内容，从子页面中找到图片的下载地址img->src
在这里插入图片描述
第二部分的主要代码

for a in alist:
    href = "https://www.umei.cc"+a.get("href")#直接通过get就可以拿到属性值
    #print(href)
    child_resp = requests.get(href)
    child_resp.encoding = "utf-8"
    child_page = BeautifulSoup(child_resp.text,"html.parser")
    img = child_page.find("section",class_="img-content").find("img")
    #print(img.get("src"))
    src = img.get("src")

其第三部分就是通过图片的链接，将图片保存下来
全部代码如下：

# 1、拿到主页面的源代码，然后提取到子页面的链接href
# 2、通过href拿到子页面的内容，从子页面中找到图片的下载地址img->src
# 3、下载图片


import requests
from bs4 import BeautifulSoup
import time

url = "https://www.umei.cc/bizhitupian/weimeibizhi/"
resp = requests.get(url)
resp.encoding = "utf-8"#处理乱码

#将源代码交给beautifulsoup
main_page = BeautifulSoup(resp.text,"html.parser")
alist = main_page.find("ul",attrs={"class":"pic-list after"}).find_all("a")
# print(alist)
for a in alist:
    href = "https://www.umei.cc"+a.get("href")#直接通过get就可以拿到属性值
    #print(href)
    child_resp = requests.get(href)
    child_resp.encoding = "utf-8"
    child_page = BeautifulSoup(child_resp.text,"html.parser")
    img = child_page.find("section",class_="img-content").find("img")
    #print(img.get("src"))
    src = img.get("src")
    #下载图片
    img_resp = requests.get(src)
    img_resp.content#这里拿到的是字节
    img_name = src.split("/")[-1]#给图片的名字（url的最后一部分）
    with open("img/"+img_name,mode="wb") as f:
        f.write(img_resp.content)#将图片的内容写入文件
    
    print("over!",img_name)
    time.sleep(1)

结果展示：
在这里插入图片描述