python小程序-0007

原创于 2017-07-07 11:31:36 发布 · 242 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python

python 专栏收录该内容

62 篇文章

订阅专栏

本文介绍了一种使用Python结合BeautifulSoup库从HTML文件中提取纯文本内容的方法。通过读取指定路径下的HTML文件，利用BeautifulSoup解析并去除HTML标签，最终输出纯净的文本内容。

第 7 题：一个HTML文件，找出里面的正文。

#!/usr/bin/env python3
# -*- coding : utf-8 -*-

import requests
from bs4 import BeautifulSoup


if __name__ == "__main__":
    htmlpath = input('请输入html文件路径：')
    with open(htmlpath,'r',encoding='utf-8') as html:
        htmlall = html.read()
        soup = BeautifulSoup(htmlall)
        print(soup.get_text())