爬虫,第十次实战之线程池(梨视频下载)

博主遇到Python爬虫代码的bug,在最后一两步不知如何处理,希望得到大神帮助,还给出了转载来源。

最后几行代码应该是个bug,不知道怎么处理,在最后一两步,需要大神

import requests
import re
from lxml import etree
from multiprocessing.dummy import Pool
headers = {'User-Agen':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.4098.3 Safari/537.36'}
url = 'https://www.pearvideo.com/category_5'
page_text = requests.get(url=url,headers=headers).text
tree = etree.HTML(page_text)
page_list = tree.xpath('//*[@id="listvideoListUl"]/li')
# print(page_list)
url_list = []
for li in page_list:
    lis = 'https://www.pearvideo.com/'+li.xpath('./div/a/@href')[0]
    video_name = li.xpath('./div/a/div[2]/text()')[0] + '.mp4'
    # print(video_name)
    lis_page_text = requests.get(url=lis,headers=headers).text
    ex = 'srcUrl="(.*?)",vdoUrl'
    video_url = re.findall(ex,lis_page_text,re.S)
    # print(video_url)
    dic = {
        'name' : video_name,
        'video_player' : video_url
    }
    url_list.append(dic)
def Get_page_data(dic):
    urls = dic[video_url]
    page_content = requests.get(url=urls,headers=headers).content
    with open(dic[video_name],'wb') as fp:
        fp.write(page_content)
pool = Pool(4)
pool.map(Get_page_data,url_list)

 

转载于:https://www.cnblogs.com/sucanji/p/10880129.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值