python学习笔记（二） BeautifulSoup

最新推荐文章于 2021-11-05 22:22:47 发布

原创最新推荐文章于 2021-11-05 22:22:47 发布 · 462 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#python

python 专栏收录该内容

6 篇文章

订阅专栏

本文介绍如何使用Python3和BeautifulSoup库进行网页爬取。通过示例代码，详细展示了从输入URL到解析HTML并获取所有链接的过程。适合初学者快速上手。

在python3.x中使用BeautifulSoup稍有不同，请参照下面

import urllib.request
from bs4 import BeautifulSoup

url = input("Enter-")
html = urllib.request.urlopen(url)
soup = BeautifulSoup(html,'html.parser')

tags = soup('a')

#soup相当于建立一个dict（）


for tag in tags:
    print(tag.get('href',None))

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

BabyBirdToFly

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

使用get函数无法获取相对应的标签

weixin_41931602的博客

05-01

583

# -*- coding: utf-8 -*- import requests import pandas as pd from bs4 import BeautifulSoup import re import json #import cx_Oracle from sqlalchemy import create_engine import sys url = 'http://www.sac...

【Python 学习笔记】

最新发布

weixin_51707948的博客

09-29

973

Python官网下载https://www.python.org/downloads/

参与评论您还未登录，请先登录后发表或查看评论

#python学习笔记（十六）#解析HTML，BeautifulSoup

weixin_38980061的博客

11-05

492

目录 1 应用正则表达式解析HTML 2 应用 BeautifulSoup解析HTML Web scraping is when we write a program that pretends to be a web browser and retrieves pages, then examines the data in those pages looking for patterns. Web抓取是指我们编写一个程序，假装是一个Web浏览器，然后检索页面，然后检查这些页面中的数据寻找模式。

python 抓取网页链接_从Python中的网页抓取链接

cumtb2002的博客

07-23

4620

python 抓取网页链接Prerequisite: 先决条件： Urllib3: It is a powerful, sanity-friendly HTTP client for Python with having many features like thread safety, client-side SSL/TSL verification, connection pooling,...

python爬虫——BeautifulSoup基础操作

xiao_lxl的专栏

08-18

1552

安装好BeautifulSoup4和Jupyter之后，在cmd中输入jupyter notebook 运行，会直接跳转到网页jupyter编辑器中。import requests newsurl = "http://news.sina.com.cn/china/" res = requests.get(newsurl) res.encoding = 'utf-8' print(res.text)

Python爬虫包 BeautifulSoup 学习（八） parent等应用

Mr.Phoebe的专栏

01-23

5725

继续使用上篇的html页面内容：html_doc = """ <html> <head><title>The Dormouse's story</title></head> The Dormouse's story Once upon a time there were three little sist

python爬虫学习笔记之Beautifulsoup模块用法详解

09-17

**Python 爬虫Beautifulsoup模块详解** 在Python爬虫领域，Beautifulsoup是一个不可或缺的工具，它是一个用于从HTML或XML文件中提取数据的库。它提供了简单的API，使得开发者可以方便地对网页进行导航、搜索和修改...

python爬虫必备库BeautifulSoup4学习笔记

05-24

完整介绍了python爬虫必备库BeautifulSoup4库里的几乎所有内容： Table Of Contents - Beautiful Soup 4.2.0 文档 - 对象的种类 - Tag 标签 - Name Tag的名字 - Attributes 操作类似于字典比如id，class_ - ...

Python学习笔记1

02-13

"Python学习笔记1"可能包含了一系列关于Python基础知识、语法结构和常见应用的讲解。在Python的世界里，首先接触的就是其基本语法，如变量定义、数据类型（包括整型、浮点型、字符串、布尔型、列表、元组、字典和...

教女朋友学python系列--手把手教你用Python3进行网络爬虫

爱穿格子裤的专栏

06-13

3550

手把手教你用Python3进行网络爬虫 2018/6/11 星期一整理运行的环境： win10 x64 安装了anaconda3，基于Python3环境运行使用Pycharm编程 1. 前期工作安装 requests模块，API参考安装 BeautifulSoup 4.2模块，API参考 2. 主要目的作为一个从事大数据小白，既然口...

BeautifulSoup4用法详解

热门推荐

菲宇运维

03-31

1万+

1. Beautiful Soup的简介 Beautiful Soup提供一些简单的、python式的函数用来处理导航、搜索、修改分析树等功能。它是一个工具箱，通过解析文档为用户提供需要抓取的数据，因为简单，所以不需要多少代码就可以写出一个完整的应用程序。 Beautiful Soup自动将输入文档转换为Unicode编码，输出文档转换为utf-8编码。你不需要考虑编码方式，除非文档没有指定一...

如何用BeautifulSoup从HTML网页提取数据并求和-Python练习册

Sylvia的代码练习册

06-12

1550

练习题的要求是从这个网页http://py4e-data.dr-chuck.net/comments_42.html里提取数字，并把这些数字转换为float型后求和。网页截图如下网页源代码截图：查看网页源代码发现数值前后的tag是"span"，因此用span来定位，用for 循环查看这个tag， tag.contents[0]表示这个数字，再用 k =...

提取HTML的href,从html中提取href

weixin_32445333的博客

06-28

799

我获得以下html： Acaryochloris_marina_MBIC11017_> Jun 12 2013 Acetobacter_pasteurianus_386B_u> Aug 8 2013还有更多......我想从这里提取href。这是我的python脚本:( page_source包含html)soup = BeautifulSoup(page_source)link...

python3用BeautifulSoup用字典的方法抓取a标签内的数据

weixin_34351321的博客

11-13

2930

# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 #标签操作 from bs4 import BeautifulSoup import urllib.request import re #如果是网址，可以用这个办法来读取网页 #html_doc = "http://tieba...

python3用BeautifulSoup抓取a标签

weixin_34032621的博客

11-13

856

# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 from bs4 import BeautifulSoup import urllib.request html_doc = "http://tieba.baidu.com/p/2460150866" req = urllib...

python学习笔记（三） Google map API调用

BabyBirdToFly的博客

12-28

6363

import urllib import urllib.request import json serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?' #serviceurl = 'http://python-data.dr-chuck.net/geojson?' while True: address = inp

python学习笔记（一） socket

BabyBirdToFly的博客

12-24

1187

python3.X系列以上调用socket.send()时候需要将str进行编码，变成字节数据，同样接受数据的时候需要进行解码。才能够安装正常的格式进行显示。 import socket mysock = socket.socket(socket.AF_INET,socket.SOCK_STREAM) mysock.connect(('www.dr-chuck.com',80)) myso