Beautiful Soup

拉倒就拉倒

于 2019-11-14 20:31:31 发布

阅读量195

点赞数

分类专栏：爬虫入门

本文链接：https://blog.youkuaiyun.com/weixin_44466903/article/details/103049765

版权

爬虫入门专栏收录该内容

2 篇文章

订阅专栏

（一）bs4库的基本元素
BS库是解析、遍历、维护“标签树”的功能库
在这里插入图片描述

>>>from bs4 import BeautifulSoup
>>>soup = BeautifulSoup("<html>data</html>", "html.parser")
>>>soup2= BeautifulSoup(open("D://demo.html"), "html.parser")

BeautifulSoup对应一个HTML/XML文档的全部内容

在这里插入图片描述

（二）基于bs4库的HTML遍历方法

下行遍历
上行遍历

soup = BeautifulBsoup(demo, "html.parser")
for parent in soup.a.parents:
    if parent is None:
        print(parent)
    else:
        print(parent.name)