
爬虫
爬虫
shiyu_mj
这个作者很懒,什么都没留下…
展开
-
tvzhishi 下载
import requestsfrom lxml import etreeimport reimport threading def spider(link,sub): html1=requests.get(link,headers=headers) k=r'<div class="content" id="content">(?P<contend>.*?)</div>' contend=re.findall(k,html1.tex原创 2022-01-18 14:57:04 · 2172 阅读 · 0 评论 -
爬取豆瓣电影top250
1.requsts失败,换了好几个headers都不行,一直418,用的seleniumfrom lxml import etreefrom selenium import webdriverimport timeimport threadingdef dis(itable): for it in itable: print(it)url='https://movie.douban.com/top250'def cal(url): #selenium或取代原创 2022-01-13 09:13:52 · 314 阅读 · 0 评论 -
81zw 下载
# -*- coding: utf-8 -*-"""Created on Thu Jan 6 12:38:42 2022@author: shiyu"""import timeimport requestsfrom lxml import etreeimport reimport threadingimport csvimport pandas as pddef spider(con,dic): url=con[0] print(con[1])原创 2022-01-10 15:40:37 · 772 阅读 · 0 评论 -
python 下载图片
下再webp转为jpgimport requestsimport reimport threadingimport timefrom PIL import Imagefrom io import BytesIOdef dis(iter): for it in iter: print(it)def get(pho_url,i): html=requests.get(pho_url) byte_stream = BytesIO(html.content原创 2022-01-17 16:50:25 · 430 阅读 · 0 评论 -
python 爬虫
一、支持任意章节链接或目录链接# -*- coding: utf-8 -*-"""Created on Thu Jan 6 12:38:42 2022@author: shiyu"""import timeimport requestsfrom lxml import etreeimport redef spider(url): html=requests.get(url,headers=headers) html_=etree.H原创 2022-01-06 22:29:08 · 86 阅读 · 0 评论 -
python 从题库excel中读取需要的属性生成json,然后爬取问卷星比对出答案
import pandas as pdimport reimport json df=pd.read_excel('文化题库.xlsx',sheet_name ='Sheet1')k='[A-Z]'dic={}#清空base.txtwith open('base.txt','w') as f: pass#表格第一行被读取成columns了,所以从1开始for i in range(1,161): line=df.iloc[i] #line[8]有na原创 2022-01-05 19:25:12 · 970 阅读 · 0 评论 -
python 下载B站视频
# -*- coding: utf-8 -*-"""Created on Sat Jan 8 18:40:55 2022@author: shiyu"""import requestsimport reimport jsonurl='https://www.bilibili.com/video/BV18D4y1c7BM?spm_id_from=333.999.0.0'headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64原创 2022-01-09 14:01:07 · 870 阅读 · 2 评论 -
python爬取猫眼电影
爬取猫眼电影排名前一百的电影验证问题不知道怎么解决,隔段时间要重新输一次链接‘# -*- coding: utf-8 -*-"""Created on Wed Dec 29 21:07:41 2021@author: shiyu"""import requestsimport re#爬取电影名def get(url): try: html=requests.get(url,headers=headers) except: print(原创 2022-01-01 12:18:34 · 871 阅读 · 0 评论