爬取哔哩哔哩视频信息
步骤:因为哔哩哔哩时动态网页,所以需要爬取精确的url之后,在进行信息的爬取,将信息存放到csv文件中
1. 爬取更准确的url
def professional_link(url,type):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4020.400"
}
response = requests.get(url, headers=headers).content.decode('utf-8')
links = re.findall(r'<li class="video-item matrix".*?>.*?<a href="//www.bilibili.com/video/(.*?)?from.*?" title.*?>.*?</a>.*?</li>',response, re.DOTALL)
for link in links:
links_new = re.sub('\?', '', link)
links_news = 'https://www.bilibili.com/video/' + links_new
professional(links_new, headers,type)
2. 通过url提取视频信息,包括题目,点赞量,硬币数,作者id等
def professional(link, headers,type):
url = "https://api.bilibili.com/x/web-interface/view?&bvid="+link
response = requests.get(url, headers).content.decode('utf-8')
content = json.loads(response)
title = content['data'