python爬取歌曲宝周排行音乐


最近B站看了很多爬虫相关的视频,也想找个网站学习下批量爬取,于是搜了下免费歌曲,发现歌曲宝可以免费下载歌曲,于是尝试在上面进行批量爬取,下面记录批量爬取的实现过程。

获取歌曲列表

web分析相关请求

分析周下载榜的url,可以看到携带一个参数page,可以查看多页,歌曲的相对url路径是存储在html中,使用正则表达式进行相关的解析获取在这里插入图片描述
## 对相关请求进行编码

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36',
     pattern = r'<a\s+href="([^"]+)">\s*(.*?)\s*</a>'
for i in range(totalPage):
      response = requests.get('https://www.gequbao.com/top/week-download', headers=headers, params={
          'page': f'{i}',
      })
      if response.status_code != 200:
          print(f'getSongs error: {response.text}')
          continue
      matches = re.findall(pattern, response.text)
      for match in matches:
          url, text = match
          print(f"URL: {url}")
          print(f"Text: {text}")
      time.sleep(1.03)
}

获取歌曲的下载url

分析web相关请求

使用歌曲的相对url进行拼接可以获取歌曲的play_id,然后通过play_id进行url请求可以获取有效的下载url在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

完整的批量下载代码

import json
import os.path
import time

import requests
import re

totalPage = 13
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36',
}
saveDir = 'I:/songs/'

# 获取周排行榜的歌曲
def getSongs():
    songs = {}
    pattern = r'<a\s+href="([^"]+)">\s*(.*?)\s*</a>'
    for i in range(totalPage):
        response = requests.get('https://www.gequbao.com/top/week-download', headers=headers, params={
            'page': f'{i}',
        })
        if response.status_code != 200:
            print(f'getSongs error: {response.text}')
            continue
        matches = re.findall(pattern, response.text)
        for match in matches:
            url, text = match
            if os.path.isfile(f'{saveDir}{text}.mp3'):
                continue
            print(f"URL: {url}")
            print(f"Text: {text}")
            songs[url] = text
        time.sleep(1.03)
    return songs

# 获取歌曲的播放url
def getSongsUrls(songs):
    songUrls = {}
    for key, value in songs.items():
        response = requests.get(f'https://www.gequbao.com{key}')
        pattern = r"window\.play_id\s*=\s*'([^']+)';"
        matches = re.search(pattern, response.text)
        response = requests.post('https://www.gequbao.com/api/play-url', data=json.dumps({'id': matches.group(1)}),
                                 headers={'Content-Type': 'application/json'})
        if response.status_code != 200:
            print(f'getSongsUrls error: {response.text}')
            continue
        try:
            jsonObj = response.json()
            print(f'song: {value} play_url: {jsonObj['data']['url']}')
            songUrls[jsonObj['data']['url']] = value
            time.sleep(1.03)
        except Exception as e:
            print(f'getSongsUrls error song : {value} key: key')
    return songUrls

# 下载歌曲
def downloadSongs(songs):
    for key, value in songs.items():
        with open(f'{saveDir}{value}.mp3', "wb") as file:
            response = requests.get(key)
            if response.status_code != 200:
                print(f'downloadSongs error: {response.text}')
                continue
            file.write(response.content)

downloadSongs(getSongsUrls(getSongs()))
# print(response.text)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值