数据源urls.txt
文件格式如下,每行一个文件名➕一个文件地址,中间用空格
隔开
30035.png https://test.cn/uploads/ae8b2718b067a03c2b6e.png
30037.png https://test.cn/uploads/941c916a4a2489b946db.png
30050.png https://test.cn/uploads/56a01998b98d44cc68b5.png
30052.png https://test.cn/uploads/285145b917451557dc44.png
30053.png https://test.cn/uploads/0781117da06d1381428d.png
Shell脚本down.sh
#!/bin/bash
while read file_name file_url
do
if [ -f "./img/${file_name}" ]
then
echo "文件${file_name}已存在"
else
wget -O ./img/${file_name} -c ${file_url}
fi
done < urls.txt
命令行执行./down.sh
即可
Python脚本down.py
#-*- coding:utf-8 –*-
from urllib import request
import ssl
import os
import time
ssl._create_default_https_context = ssl._create_unverified_context
start_time = time.time()
def download():
try:
file = open("urls.txt",'r')
while True:
line = file.readline()
if line:
arr = line.split(' ')
file_name = arr[0]
file_url = arr[1]
if os.path.exists('./img/' + file_name):
print('文件' + file_name +'已存在')
else:
request.urlretrieve(file_url,'./img/' + file_name)
else:
break
except Exception as e:
print(e)
print('发生错误重新请求')
download()
finally:
file.close()
end_time = time.time()
duration = end_time - start_time
print('处理完成,一共耗时',duration,'秒')
download()
命令行执行python down.py
即可
当然也可以使用第三方模块requests
,需要另行安装
Python request与requests比较
pip3 install requests
如果文件数量比较多,可以通过分割成几个小的文本文件,开启多个Python进程一起下载
Python大文本文件按行数分割成多个小文本文件
shell向python传递参数
创建bd.py
#-*- coding:utf-8 –*-
from urllib import request
import ssl
import os
import time
import argparse
# 新建参数解释器对象
parser = argparse.ArgumentParser()
# 添加参数
parser.add_argument('--sourceFile')
# 添加参数,注明参数类型
#parser.add_argument('--count',type=int)
# 参数赋值,也可以通过终端赋值
args = parser.parse_args()
ssl._create_default_https_context = ssl._create_unverified_context
start_time = time.time()
def download(sourceFile):
try:
file = open(sourceFile,'r')
while True:
line = file.readline()
if line:
arr = line.split(' ')
file_name = arr[0]
file_url = arr[1]
if os.path.exists('./img/' + file_name):
print('文件' + file_name +'已存在')
else:
request.urlretrieve(file_url,'./img/' + file_name)
else:
break
except Exception as e:
print(e)
print('发生错误重新请求')
download()
finally:
file.close()
end_time = time.time()
duration = end_time - start_time
print('处理完成,一共耗时',duration,'秒')
download(args.sourceFile)
将urls.txt
分割成10个小文本文件
并行执行开启10个Python进程加速下载
创建bitch_down.sh
,内容如下
#!/bin/sh
python bd.py --sourceFile "urls_part1.txt" &
python bd.py --sourceFile "urls_part2.txt" &
python bd.py --sourceFile "urls_part3.txt" &
python bd.py --sourceFile "urls_part4.txt" &
python bd.py --sourceFile "urls_part5.txt" &
python bd.py --sourceFile "urls_part6.txt" &
python bd.py --sourceFile "urls_part7.txt" &
python bd.py --sourceFile "urls_part8.txt" &
python bd.py --sourceFile "urls_part9.txt" &
python bd.py --sourceFile "urls_part10.txt"
执行
./bitch_down.sh