html选择特定的属性,选择具有bs4的一个或多个特定属性的HTML标记(Select HTML tags that have one or more specific attributes with...

Essenceback

于 2021-06-03 08:00:42 发布

阅读量312

点赞数

文章标签： html选择特定的属性

选择具有bs4的一个或多个特定属性的HTML标记(Select HTML tags that have one or more specific attributes with bs4)

我想使用soup.find_all来查找具有id或name属性的所有HTML标记。

以下代码适用于id属性：

for tag in soup.find_all(attrs={"id": True}):

但是，以下具有两个属性的代码不会：

for tag in soup.find_all(attrs={"id":True, "name":True}):

是否可以使用bs4进行布尔搜索，找到具有两个特定属性之一(或两个属性)的所有标记，或者我是否必须单独搜索每个属性？

I want to use use soup.find_all to find all HTML tags that have id or name attributes.

The following code works for the id attribute:

for tag in soup.find_all(attrs={"id": True}):

However, the following code with two attributes doesn't:

for tag in soup.find_all(attrs={"id":True, "name":True}):

Is it possible to do a Boolean search with bs4 that will find all tags that have one of two specific attributes (or both attributes) or will I have to search for each attribute separately?

原文：https://stackoverflow.com/questions/37851225

更新时间：2019-07-27 13:37

最满意答案

soup.find_all(lambda element: 'name' in element.attrs or 'id' in element.attrs)

我们使用lambda来访问find_all的元素。然后，我们使用in运算符来检查element.attrs (它是一个字典)是否具有键name 或 id 。

soup.find_all(lambda element: 'name' in element.attrs or 'id' in element.attrs)

We use lambda to access the element inside find_all. And then, we use the in operator to check if element.attrs (it's a dictionary) has key name or id.

相关问答

问题出在这里： firstVod = soup.findAll("tr",text=re.compile('rapidvideo'))

当BeautifulSoup将应用您的文本正则表达式模式时，它将使用所有匹配的tr元素的.string属性值。现在， .string有一个重要的警告 - 当一个元素有多.string元素时， .string是None ：如果标记包含多个内容，则不清楚.string应该引用什么，因此.string被定义为None 。因此，你没有结果。你可以做的是通过使用搜

...

soup.find_all(lambda element: 'name' in element.attrs or 'id' in element.attrs)

我们使用lambda来访问find_all的元素。然后，我们使用in运算符来检查element.attrs (它是一个字典)是否具有键name 或 id 。 soup.find_all(lambda element: 'name' in element.attrs or 'id' in element.attrs)

We use la

...

使用.text从标记中获取文本。 oname = soup.find("title")

oname.text

或者只是soup.title.text In [4]: from bs4 import BeautifulSoup

In [5]: import requests

In [6]: r = requests.get("http://stackoverflow.com/questions/27934387/how-to-retrieve-information-inside-a-t

...

我想你可以用： [i.contents[0].strip() for i in soup.select('td.first')]

关于问题的第二部分 - 您希望将字段放在单个变量中吗？你可以做到，但这可能不是一个好主意。这有什么理由吗？或者，你知道它们中有多少，在这种情况下你可以这样做： n1, n2, n3, ...nN = [i.contents[0].strip() for i in soup.select('td.first')]

或者你没有，在这种情况下，一个数组(列表，在py

...

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。