Python爬取哔哩哔哩视频信息

最新推荐文章于 2025-05-04 23:39:14 发布

一只有理想的二哈

最新推荐文章于 2025-05-04 23:39:14 发布

阅读量2.1k

点赞数 1

文章标签： python

本文链接：https://blog.youkuaiyun.com/weixin_45841852/article/details/105457748

版权

爬取哔哩哔哩视频信息

步骤：因为哔哩哔哩时动态网页，所以需要爬取精确的url之后，在进行信息的爬取，将信息存放到csv文件中

1. 爬取更准确的url

# 提取正确的url
def professional_link(url,type):
    headers = {
   
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4020.400"
    }
    response = requests.get(url, headers=headers).content.decode('utf-8')
    # 利用正则提取动态网页的url部分内容
    links = re.findall(r'<li class="video-item matrix".*?>.*?<a href="//www.bilibili.com/video/(.*?)?from.*?" title.*?>.*?</a>.*?</li>',response, re.DOTALL)
    for link in links:
        links_new = re.sub('\?', '', link)
        links_news = 'https://www.bilibili.com/video/' + links_new
        professional(links_new, headers,type)

2. 通过url提取视频信息，包括题目，点赞量，硬币数，作者id等

# 提取视频信息，包括标题，点赞量，赞等
def professional(link, headers,type):
    url = "https://api.bilibili.com/x/web-interface/view?&bvid="+link
    response = requests.get(url, headers).content.decode('utf-8')
    content = json.loads(response)
    # 利用json提取视频信息
    title = content['data'