python http 组件简介

本文介绍Python中用于网络爬虫的mechanize库,包括其提供的浏览器历史记录、表单状态及Cookies等功能;同时介绍了pysqlite库作为Python与SQLite3数据库之间的接口,并概述了cookielib模块在HTTP客户端Cookie处理中的应用。

1. mechanize 

https://pypi.python.org/pypi/mechanize/

中文简介:基于urllib2,完全兼容urllib2,提供浏览历史,表单状态,cookies等功能。

mechanize 0.2.5

Stateful programmatic web browsing.

Stateful programmatic web browsing, after Andy Lester's Perl module WWW::Mechanize.

mechanize.Browser implements the urllib2.OpenerDirector interface. Browser objects have state, including navigation history, HTML form state, cookies, etc. The set of features and URL schemes handled by Browser objects is configurable. The library also provides an API that is mostly compatible with urllib2: your urllib2 program will likely still work if you replace "urllib2" with "mechanize" everywhere.

Features include: ftp:, http: and file: URL schemes, browser history, hyperlink and HTML form support, HTTP cookies, HTTP-EQUIV and Refresh, Referer [sic] header, robots.txt, redirections, proxies, and Basic and Digest HTTP authentication.

Much of the code originally derived from Perl code by Gisle Aas (libwww-perl), Johnny Lee (MSIE Cookie support) and last but not least Andy Lester (WWW::Mechanize). urllib2 was written by Jeremy Hylton.

2. pysqlite

中文简介:sqlite3的python接口

pysqlite 2.6.3

DB-API 2.0 interface for SQLite 3.x

Python interface to SQLite 3

pysqlite is an interface to the SQLite 3.x embedded relational database engine. It is almost fully compliant with the Python database API version 2.0 also exposes the unique features of SQLite.

google code ducument链接 https://code.google.com/p/pysqlite/wiki/Documentation

使用教程

sqlite3 — DB-API 2.0 interface for SQLite databases

https://pysqlite.readthedocs.org/en/latest/sqlite3.html

3. cookielib

功能:http客户端cookie处理

http://docs.python.org/2/library/cookielib.html

The cookielib module defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.

Both the regular Netscape cookie protocol and the protocol defined by RFC 2965 are handled. RFC 2965 handling is switched off by default. RFC 2109 cookies are parsed as Netscape cookies and subsequently treated either as Netscape or RFC 2965 cookies according to the ‘policy’ in effect. Note that the great majority of cookies on the Internet are Netscape cookies. cookielib attempts to follow the de-facto Netscape cookie protocol (which differs substantially from that set out in the original Netscape specification), including taking note of the max-age and port cookie-attributes introduced with RFC 2965.

转载于:https://my.oschina.net/blueprint/blog/136140

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值