python3 --小爬虫（爬取美剧字幕）

最新推荐文章于 2024-02-02 10:25:10 发布

原创

最新推荐文章于 2024-02-02 10:25:10 发布 · 2.5k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#python3 #爬虫 #美剧字幕

该博客介绍了使用Python3编写的小爬虫，通过BeautifulSoup库解析HTML，从特定网站抓取美剧字幕的相关链接。首先获取网页URL，然后对每个页面进行迭代，提取页面标题及含有英文字幕和中文字幕的div元素，并将内容写入文件M_S6.txt。

# !bin/usr/env python3
# coding=utf-8
import re
import urllib.request
from bs4 import BeautifulSoup

'''获取网址'''
def get_url(url):
    Url = []
    #url = 'http://www.kekenet.com/video/16692/'
    f = urllib.request.urlopen(url)
    html = f.read()
    soup = BeautifulSoup(html,'html.parser')
    content = soup.find_all('ul',id='menu-list')
    for tag in content:
        li = tag.find_all('li')        #类型<class 'bs4.element.ResultSet'>