先做一下小测试,看看人人网首页是哪个form提交数据,提交数据的域有哪些
<span style="font-size:18px;">import mechanize
br=mechanize.Browser()
br.open('http://www.renren.com')
for form in br.forms():
print form</span>
打印结果为:
<span style="font-size:18px;"><POST http://www.renren.com/PLogin.do application/x-www-form-urlencoded
<TextControl(email=)>
<PasswordControl(password=)>
<CheckboxControl(autoLogin=[true])>
<TextControl(icode=)>
<HiddenControl(origURL=http://www.renren.com/home) (readonly)>
<HiddenControl(domain=renren.com) (readonly)>
<HiddenControl(key_id=1) (readonly)>
<HiddenControl(captcha_type=web_login) (readonly)>
<SubmitControl(<None>=登录) (readonly)>></span>
看起来是第一个form在提交数据,数据域为email和password
<span style="font-size:18px;">import mechanize
br=mechanize.Browser()
br.open('http://www.renren.com')
br.select_form(nr=0)
br['email']='email'#登陆账号,通常为邮箱地址
br['password']='password'#登陆密码
r=br.submit()
print br.title()</span>
可以看到打印出来的title为“人人网-XXX“,XXX为你在人人网的注册名,从而证明登录成功。
当然如果熟悉基础包,使用如下方法也可以达到登陆人人网的目的
<span style="font-size:18px;">#encoding:utf-8
import cookielib
import urllib2
import urllib
cj=cookielib.LWPCookieJar()
opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
url='http://www.renren.com/PLogin.do'
parms={'email':'xxx','password':'xxx'}
parm=urllib.urlencode(parms)
req=urllib2.Request(url,parm)
response=opener.open(req)
print response.geturl()#查看登陆信息,验证登陆成功
print response.info()</span>
正式动工,利用mechanize实现自动化登陆与网页内容下载,re或BeautifulSoup实现数据提取,源代码:
<span style="font-size:18px;">#encoding:utf-8
'''
Created on 2015年1月21日
@author:
'''
import mechanize
import re
import time
from bs4 import BeautifulSoup
#初始化一个浏览器对象
def initBrowser():
#Browser
br = mechanize.Browser()
#options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
#Follows refresh 0 b