如果是直接装了Anaconda集成开发环境的,就可以直接移步源码了
否则的话,在爬取图片之前要安装几个包
第一个:bs4包,需要用到其中的BeautifulSoap,是一个功能强大的网页解析工具
pip3 install bs4
第二个:requests包,安装步骤如上,用于抓取网页源代码
代码如下:
import requests,os
from bs4 import BeautifulSoup
from urllib.request import urlopen
images_dir = "images/"
if not os.path.exists(images_dir):
os.mkdir(images_dir)
url = "http://www.baidu.com/"
html = requests.get(url)
html.encoding = 'utf-8'
sp = BeautifulSoup(html.text,'html.parser')
links = sp.find_all(["img",'a'])
index = 0
for link in links:
src = link.get('src')
if src != None and 'https' in src:
if 'jpg' in src:
img_name = str(index+1) + ".jpg"
elif 'png' in src:
img_name = str(index+1) + ".png"
image = urlopen(src)
f = open(os.path.join(images_dir,img_name),"wb")
f.write(image.read())
f.close()
print('%d finish\n'%(index+1))
index = index + 1
print("OK")