Python分析网页神器pyquery

最新推荐文章于 2025-06-24 09:35:54 发布

原创最新推荐文章于 2025-06-24 09:35:54 发布 · 2.8k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#python #jquery #pyquery

python 专栏收录该内容

10 篇文章

订阅专栏

本文介绍了一个模仿 jQuery 的 Python 库 pyquery。通过示例代码展示了如何使用 pyquery 分析网页内容，提取链接等信息。同时提供了 jQuery 遍历函数的详细列表，帮助读者更好地理解和运用 pyquery。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

关于pyquery

相信做web的没有不知道jQuery的。它是那么的方便，功能强大。它的一大特点就是它的选择器。
pyquery是一个模仿jquery的python编写的分析网页的类库。它的接口完全模仿了jquery。

pyquery的文档

pyquery文档

测试

import urllib
from pyquery import PyQuery as pq
import codecs


# fetch page
print 'fetch page...'
url = 'http://www.7dsw.com/toplastupdate/1.html'
resp = urllib.urlopen(url)
page = resp.read()
page = page.decode('gbk')

fetch page...

doc = pq(page)

doc

[<html>]

wanted = doc('a')

wanted

[<a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a.first>, <a.pgroup>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a>, <a.next>, <a.ngroup>, <a.last>]

d = [i.attr('href') for i in wanted.items()]

['#',
 u"javascript:window.external.addFavorite('http://www.7dsw.com','7\u5ea6\u4e66\u5c4b_\u4e66\u53cb\u6700\u503c\u5f97\u6536\u85cf\u7684\u7f51\u7edc\u5c0f\u8bf4\u9605\u8bfb\u7f51')",
 'http://www.7dsw.com',
 '/newmessage.php?tosys=1',
 '/jifen.html',
 'http://www.7dsw.com/',
 '/modules/article/bookcase.php',
 'http://www.7dsw.com/sort1/1.html',
 'http://www.7dsw.com/sort2/1.html',
 'http://www.7dsw.com/sort3/1.html',
 'http://www.7dsw.com/sort4/1.html',
 'http://www.7dsw.com/sort5/1.html',
 'http://www.7dsw.com/sort6/1.html',
 '/quanben/',
 '/toplastupdate/1.html',
 'http://www.7dsw.com/book/17/17870/',
 'http://www.7dsw.com/book/17/17870/11409157.html',
 'http://www.7dsw.com/book/2/2827/',
 'http://www.7dsw.com/book/2/2827/11409156.html',
 'http://www.7dsw.com/book/18/18732/',
 'http://www.7dsw.com/book/18/18732/11409155.html',
 'http://www.7dsw.com/book/33/33268/',
 'http://www.7dsw.com/book/33/33268/11409154.html',
 'http://www.7dsw.com/book/27/27876/',
 'http://www.7dsw.com/book/27/27876/11409150.html',
 'http://www.7dsw.com/book/4/4876/',
 'http://www.7dsw.com/book/4/4876/11409145.html',
 'http://www.7dsw.com/book/33/33261/',
 'http://www.7dsw.com/book/33/33261/11409144.html',
 'http://www.7dsw.com/book/29/29849/',
 'http://www.7dsw.com/book/29/29849/11409133.html',
 'http://www.7dsw.com/book/32/32541/',
 'http://www.7dsw.com/book/32/32541/11409132.html',
 'http://www.7dsw.com/book/30/30083/',
 'http://www.7dsw.com/book/30/30083/11409130.html',
 'http://www.7dsw.com/book/15/15156/',
 'http://www.7dsw.com/book/15/15156/11409124.html',
 'http://www.7dsw.com/book/33/33518/',
 'http://www.7dsw.com/book/33/33518/11409123.html',
 'http://www.7dsw.com/book/31/31904/',
 'http://www.7dsw.com/book/31/31904/11409115.html',
 'http://www.7dsw.com/book/6/6807/',
 'http://www.7dsw.com/book/6/6807/11409112.html',
 'http://www.7dsw.com/book/30/30605/',
 'http://www.7dsw.com/book/30/30605/11409109.html',
 'http://www.7dsw.com/book/33/33169/',
 'http://www.7dsw.com/book/33/33169/11409107.html',
 'http://www.7dsw.com/book/6/6415/',
 'http://www.7dsw.com/book/6/6415/11409101.html',
 'http://www.7dsw.com/book/30/30440/',
 'http://www.7dsw.com/book/30/30440/11409099.html',
 'http://www.7dsw.com/book/28/28703/',
 'http://www.7dsw.com/book/28/28703/11409096.html',
 'http://www.7dsw.com/book/28/28849/',
 'http://www.7dsw.com/book/28/28849/11409095.html',
 'http://www.7dsw.com/book/29/29668/',
 'http://www.7dsw.com/book/29/29668/11409093.html',
 'http://www.7dsw.com/book/33/33460/',
 'http://www.7dsw.com/book/33/33460/11409091.html',
 'http://www.7dsw.com/book/33/33683/',
 'http://www.7dsw.com/book/33/33683/11409090.html',
 'http://www.7dsw.com/book/28/28865/',
 'http://www.7dsw.com/book/28/28865/11409086.html',
 'http://www.7dsw.com/book/22/22913/',
 'http://www.7dsw.com/book/22/22913/11409085.html',
 'http://www.7dsw.com/book/32/32568/',
 'http://www.7dsw.com/book/32/32568/11409084.html',
 'http://www.7dsw.com/book/26/26175/',
 'http://www.7dsw.com/book/26/26175/11409082.html',
 'http://www.7dsw.com/book/12/12455/',
 'http://www.7dsw.com/book/12/12455/11409081.html',
 'http://www.7dsw.com/book/28/28760/',
 'http://www.7dsw.com/book/28/28760/11409079.html',
 'http://www.7dsw.com/book/29/29305/',
 'http://www.7dsw.com/book/29/29305/11409078.html',
 'http://www.7dsw.com/toplastupdate/1.html',
 'http://www.7dsw.com/toplastupdate/1.html',
 'http://www.7dsw.com/toplastupdate/2.html',
 'http://www.7dsw.com/toplastupdate/3.html',
 'http://www.7dsw.com/toplastupdate/4.html',
 'http://www.7dsw.com/toplastupdate/5.html',
 'http://www.7dsw.com/toplastupdate/6.html',
 'http://www.7dsw.com/toplastupdate/7.html',
 'http://www.7dsw.com/toplastupdate/8.html',
 'http://www.7dsw.com/toplastupdate/9.html',
 'http://www.7dsw.com/toplastupdate/10.html',
 'http://www.7dsw.com/toplastupdate/2.html',
 'http://www.7dsw.com/toplastupdate/16.html',
 'http://www.7dsw.com/toplastupdate/1056.html']

jQuery的文档

可以参考query的文档来明白pyquery的使用方式

jQuery 遍历函数
jQuery 遍历函数包括了用于筛选、查找和串联元素的方法。
函数描述
.add() 将元素添加到匹配元素的集合中。
.andSelf() 把堆栈中之前的元素集添加到当前集合中。
.children() 获得匹配元素集合中每个元素的所有子元素。
.closest() 从元素本身开始，逐级向上级元素匹配，并返回最先匹配的祖先元素。
.contents() 获得匹配元素集合中每个元素的子元素，包括文本和注释节点。
.each() 对 jQuery 对象进行迭代，为每个匹配元素执行函数。
.end() 结束当前链中最近的一次筛选操作，并将匹配元素集合返回到前一次的状态。
.eq() 将匹配元素集合缩减为位于指定索引的新元素。
.filter() 将匹配元素集合缩减为匹配选择器或匹配函数返回值的新元素。
.find() 获得当前匹配元素集合中每个元素的后代，由选择器进行筛选。
.first() 将匹配元素集合缩减为集合中的第一个元素。
.has() 将匹配元素集合缩减为包含特定元素的后代的集合。
.is() 根据选择器检查当前匹配元素集合，如果存在至少一个匹配元素，则返回 true。
.last() 将匹配元素集合缩减为集合中的最后一个元素。
.map() 把当前匹配集合中的每个元素传递给函数，产生包含返回值的新 jQuery 对象。
.next() 获得匹配元素集合中每个元素紧邻的同辈元素。
.nextAll() 获得匹配元素集合中每个元素之后的所有同辈元素，由选择器进行筛选（可选）。
.nextUntil() 获得每个元素之后所有的同辈元素，直到遇到匹配选择器的元素为止。
.not() 从匹配元素集合中删除元素。
.offsetParent() 获得用于定位的第一个父元素。
.parent() 获得当前匹配元素集合中每个元素的父元素，由选择器筛选（可选）。
.parents() 获得当前匹配元素集合中每个元素的祖先元素，由选择器筛选（可选）。
.parentsUntil() 获得当前匹配元素集合中每个元素的祖先元素，直到遇到匹配选择器的元素为止。
.prev() 获得匹配元素集合中每个元素紧邻的前一个同辈元素，由选择器筛选（可选）。
.prevAll() 获得匹配元素集合中每个元素之前的所有同辈元素，由选择器进行筛选（可选）。
.prevUntil() 获得每个元素之前所有的同辈元素，直到遇到匹配选择器的元素为止。
.siblings() 获得匹配元素集合中所有元素的同辈元素，由选择器筛选（可选）。
.slice() 将匹配元素集合缩减为指定范围的子集。