python 自动登陆oschina

使用Python自动登录并抓取网页数据

最新推荐文章于 2024-05-05 18:39:57 发布

转载最新推荐文章于 2024-05-05 18:39:57 发布 · 641 阅读

python 专栏收录该内容

12 篇文章

订阅专栏

本文介绍了一个使用Python实现的自动化流程，通过mechanize和cookielib库完成对OsChina网站的自动登录，并成功抓取了用户的代码页面数据。此教程详细展示了如何设置浏览器参数、处理登录验证及读取目标网页内容。

转自：http://www.oschina.net/code/snippet_212240_57962

example2.自动登录oschina,并获取我的代码页面数据
#!/usr/bin/env python
#coding=utf-8
import 
mechanize
import 
cookielib
 
#Browser
br =
mechanize.Browser()
#写入cookie
cj =
cookielib.LWPCookieJar()
br.set_cookiejar(cj)
 
#options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
 
#Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),max_time=1)
 
#debugging
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
 
#User-agent
br.addheaders 
= [('User-agent',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
 
r=br.open('https://www.oschina.net/home/login?goto_page=http%3A%2F%2Fwww.oschina.net%2F')
html =
r.read()
# for f in br.forms():
#     print f
 
#登陆，oschina里面的密码使用sha1进行了js加密，所以需要那真实密码进行加密登陆
br.select_form(nr=1)
br.form['email']
= 'you xiang'
br.form['pwd']
= 'sha1 加密后的密码'
response 
= br.submit()
print
response.read()
 
#获取数据
r=br.open('http://www.oschina.net/code/list_by_user?id=212240')
html =
r.read()
print
html