最近在学习python爬虫相关技术,简单了解了多线程和协程的概念,跟着网上大佬们学写了几个小爬虫玩儿,基本弄懂了如何爬取文字,图片和简单的视频,突然就想测试一下,爬取视频到底是多线程较快还是利用协程较快。于是做了一个简单的测试:爬取一页糗事百科的视频,大概有25个视频,分别用单线程、多线程和协程,探一下高低。
下面贴出代码,核心部分都差不多,因为是初学者,代码有些稚嫩,请大佬们勿喷。
单线程:
import requests
from lxml import etree
import time
def getVideo(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36"
}
html = requests.get(url=url, headers = headers).text
tree = etree.HTML(html)
div_list = tree.xpath('//*[@id="content"]/div/div[2]')
for div in div_list:
video_src_list = div.xpath('./div/video/source/@src')
# print(video_src)
for video_src in video_src_list:
name = video_src.rsplit("/",1)[1]
# print(name)
with open(f"video/{name}",mode='wb') as f:
f.write(requests.get("http:"+video_src,headers