X视频下载爬取思路(试用)

逆向思路:

1. 访问目标链接https://x.com/kirawontmiss/status/1930784893128016280

2. 打开F12,找到html下main.xxx.js的链接文件的url

3. 访问main.xxx.js,找到TweetDetails组的数据,拿到queryid后面要用到

4. 访问https://x.com/api/graphql/{queryid}/TweetDetail?+一组参数(别看后面一大串的,只需要改个别参数即可,大多数使用固定值即可)(如果要使用爬虫那就只有这一个url的请求需要做js逆向,上面的那几个不需要)

5. 全局搜索mp4之类的值,找到如下链接,随自己喜欢来访问下载视频

下面的代码没有把“免费用于非商业用途”的IDE的下载链接也获取,我需要也获取(需要将更新整合整套代码):import requests import json def get_jetbrains_versions(): """获取社区版产品列表(包含全名和图标URL)""" products_url = "https://data.services.jetbrains.com/products" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} # 产品代码到图标标识的映射 icon_identifier_map = { 'IIC': 'idea', # IntelliJ IDEA Community 'PCP': 'pycharm', # PyCharm Community 'RM': 'ruby', # RubyMine Community 'AI': 'androidstudio', # Android Studio(虽然不是Community命名,但免费) # 添加其他社区版产品的映射... } try: response = requests.get(products_url, headers=headers) products = response.json() community_products = [] for p in products: name = p.get('name', '').lower() code = p['code'] # 过滤条件:包括社区版、CE版和Android Studio if 'community' in name or 'ce' in name or name == 'android studio': # 获取产品图标标识,如果映射中存在则使用,否则使用产品代码的小写 identifier = icon_identifier_map.get(code, code.lower()) # 构建图标URL icon_url = f"https://resources.jetbrains.com/storage/products/{identifier}/img/{identifier}_logo.svg" community_products.append({ 'code': code, 'name': p['name'], 'icon': icon_url }) return community_products except Exception as e: print(f"获取产品列表失败: {str(e)}") return [] def get_download_links(product_info): """获取产品下载链接(包含图标URL)""" product_code = product_info['code'] product_name = product_info['name'] product_icon = product_info['icon'] # 获取产品图标 api_url = f"https://data.services.jetbrains.com/products/releases?code={product_code}&type=release" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'} try: response = requests.get(api_url, headers=headers) data = response.json() product_data = data.get(product_code, []) results = [] for release in product_data: version = release.get('version', 'unknown') downloads = release.get('downloads', {}) for platform, download_info in downloads.items(): if download_info.get('name', '') in ['zip', 'tar.gz']: continue if 'link' in download_info: results.append({ 'product': product_name, 'code': product_code, 'icon': product_icon, # 添加产品图标URL 'version': version, 'platform': platform, 'size': download_info.get('size', ''), 'checksum': download_info.get('checksumLink', ''), 'download_link': download_info['link'], 'release_date': release.get('date', '') }) return results except Exception as e: print(f"获取{product_name}下载链接失败: {str(e)}") return [] if __name__ == "__main__": print("开始爬取JetBrains社区版下载链接及图标...") community_products = get_jetbrains_versions() print(f"找到 {len(community_products)} 个社区版产品") all_downloads = [] for product_info in community_products: print(f"正在处理 {product_info['name']} ({product_info['code']})...") downloads = get_download_links(product_info) print(f" 找到 {len(downloads)} 个下载项") all_downloads.extend(downloads) # 保存包含图标URL的JSON with open('jetbrains_downloads_with_icons.json', 'w', encoding='utf-8') as f: json.dump(all_downloads, f, indent=2, ensure_ascii=False) print(f"爬取完成! 共获取 {len(all_downloads)} 个下载链接") print("结果已保存到 jetbrains_downloads_with_icons.json")
09-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值