java 模拟登陆百度，附带“专业抢二楼”功能。

最新推荐文章于 2021-03-11 09:51:29 发布

原创最新推荐文章于 2021-03-11 09:51:29 发布 · 3.7k 阅读

2 ·

CC 4.0 BY-SA版权

最近看到java吧的吧主写了个自动抢二楼的脚本。觉得很奇妙，但是他不肯开源，是在不行了，自己摸索吧（当然网上也有类似教程，不过很多太老了）。

由于百度的登陆框是用js动态生成的，所以直接抓源码的话是拿不到你想要的表单的。

这次用的是 ie9。打开ie9进入“www.baidu.com”，按F12可以看到调试界面。我们先把缓存清理一下：

将下图的这两个勾打掉，以防止我们抓到的包由于页面跳转时被清理掉：

做好之后我们可以点击“开始捕获”按钮，然后先刷新一下当前的页面（这样可以避免输入验证码），跟平常一样点击登陆，在弹出框中输入用户名密码。

点击登陆，跳转到登陆后的页面之后，点击“停止捕获”按钮。

可以看到抓到了很多包。如果不知道从哪里入手的话呢，可以在下面所示的搜索框中输入自己刚刚填进去的密码（打码的是我的用户名和密码）：

可以看到我们登陆进行post操作的参数，我们可以拷贝到记事本进行分析：

排除掉一些我们知道其含义的参数后,可得出一下未确定的参数：

&token=84a7862f4bf8c2aa8a1d22cf9fbade51
&tpl=mn
&tt=1389080269653
&codestring=
&u=http%3A%2F%2Fwww.baidu.com%2Findex.php%3Ftn%3D10018802_hao
&quick_user=0
&loginmerge=true
&splogin=rate
&ppui_logintime=8801

对于这些参数，我们登陆的时候要么不传，要么照着原来的值传过去。由于以前接触过新浪微博的sdk，它是采用OAuth2.0进行授权认证，认证完后会返回一个成功授权的token。所以可先重点考虑token从哪里来的。

先把token的值复制粘贴到搜索框中，可以看到

由上，我们得知token最早是通过get箭头所示的url得到的。

我们可以尝试着访问这个url看看能否得到一个token：

可以看到是不行的。

那么同样是get方式，为什么刚刚登陆的时候可以拿到token，但是现在却不行了呢？

我能想到的只有cookie了，查了一下cookie：

还是刚刚的第17个请求，可以看到浏览器其实发送了 BAIDUID等cookie。所以我们get的时候应该将之前的cookie一起发送过去，所以第一步应该先访问“www.baidu.com”，拿到cookie，再访问“https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&tt=1389080260852&class=login&logintype=dialogLogin&callback=bd__cbs__jndgh2”

同时将cookie发送过去，拿到token以进行登陆操作。

核心代码如下：

// 第一步，登陆百度，获取需要的cookie
org.apache.http.protocol.HttpContext httpContext = new BasicHttpContext();
httpContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);

HttpClient client = new DefaultHttpClient();
HttpGet httpGet1 = new HttpGet("http://www.baidu.com/");
client.execute(httpGet1, httpContext);

// 第二步，用cookie获特定的token，用于模拟登陆的post参数
HttpClient client2 = new DefaultHttpClient();
HttpGet httpGet2 = new HttpGet(
"https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3"
+ "&tt=1388488343671&class=login&logintype=dialogLogin&callback=bd__cbs__4aeorp");
HttpContext context2 = new BasicHttpContext();
context2.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse response2 = client2.execute(httpGet2, context2);
String temp = EntityUtils.toString(response2.getEntity(), "UTF-8");
System.out.println(temp);
String temp1 = temp.substring(temp.indexOf("token") + 10);
可以看到输出结果：

可以通过解析字符串拿到token。然后在将参数post到“https://passport.baidu.com/v2/api/?login”。核心代码如下：

HttpClient client3 = new DefaultHttpClient();
HttpPost post = new HttpPost("https://passport.baidu.com/v2/api/?login");
List<NameValuePair> parameters = new ArrayList<NameValuePair>();
parameters.add(new BasicNameValuePair("staticpage",
"http://www.baidu.com/cache%2Fuser/html/v3Jump.html"));
parameters.add(new BasicNameValuePair("charset", "utf-8"));
parameters.add(new BasicNameValuePair("token", token));
parameters.add(new BasicNameValuePair("tpl", "mn"));
parameters.add(new BasicNameValuePair("apiver", "v3"));
parameters.add(new BasicNameValuePair("tt", "1388552675432"));
parameters.add(new BasicNameValuePair("safeflg", "0"));
parameters.add(new BasicNameValuePair("u", "http://www.baidu.com/"));
parameters.add(new BasicNameValuePair("isPhone", "false"));
parameters.add(new BasicNameValuePair("quick_user", "0"));
parameters.add(new BasicNameValuePair("loginmerge", "true"));
parameters.add(new BasicNameValuePair("logintype", "dailoglogin"));
parameters.add(new BasicNameValuePair("splogin", "rate"));
parameters.add(new BasicNameValuePair("username", "你的用户名"));
parameters.add(new BasicNameValuePair("password", "你的密码"));
parameters.add(new BasicNameValuePair("men_pass", "on"));
parameters.add(new BasicNameValuePair("callback",
"parent.bd__pcbs__5i3pfd"));
HttpEntity postBodyEnt = new UrlEncodedFormEntity(parameters,"utf-8");
post.setEntity(postBodyEnt);

HttpContext context3 = new BasicHttpContext();
context3.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse re = client3.execute(post, context3);

那么我们怎么知道自己登陆成功了呢？

如果登陆成功了，那么返回的html代码中应该会有一个err_no = 0 如下如ie9所示:

模拟登陆就写到这里。

现在我们看看抢二楼的，抢二楼需要用到 jsoup工具包来解析html。

首先我们可以先访问任意一个贴吧，比如linux吧，进入一个帖子进行回帖，为了方便讲解这里回复了纯字母的。

然后在抓包界面中搜索刚刚的回帖内容：

可以看到回帖时浏览器所提交的参数：

复制到记事本进行分析：

ie=utf-8 编码
&kw=linux 贴吧名称
&fid=3171
&tid=2801628284
&vcode_md5= 验证码
&floor_num=22
&rich_text=1
&tbs=2caa0071fe0f49621389084151
&content=lu+guo+hun+yan+shu 回复内容
&files=%5B%5D 附件，这个其实是“[ ]”进过编码后就变成了%5B%5D
&mouse_pwd=23%2C18%2C19%2C9%2C16%2C28%2C18%2C21%2C44%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C9%2C20%2C9%2C21%2C44%2C20%2C23%2C16%2C23%2C28%2C44%2C20%2C22%2C19%2C19%2C9%2C18%2C19%2C29%2C13890841559370 通过反编码，可以得出这个貌似是记录鼠标位置的，先不考虑
&mouse_pwd_t=1389084155937
&mouse_pwd_isclick=0
&__type__=reply 类型

所以要找出的就剩下fid ，tid，floor_num，rich_text，tbs等参数的来源

可以先回到我们刚刚访问的贴吧页面，然后用审查元素（几乎每个浏览器都支持，这里用火狐）：

可以看到，每个帖子都是用 li 标签的，回复是0的帖子里面有一个 reply_num = 0，而且里面的id貌似就是我们要找的参数中的 tid，对比了一下：

跟我们刚刚恢复的帖子的tid是一样的（参照上面贴出的tid参数）。我们进入该帖子，右键，查看网页源代码，可以看到网页源代码，然后在搜索看看是否有我们所需要的参数

依照这种方法，我们可以找到原来的网页中有 floor_num，rich_text，tbs；除去鼠标那些我们不管的参数，已经找齐了。

所以可分为以下步骤，在贴吧首页找到有 reply_num = 0 标志的 li 标签，取出其tid。然后访问帖子的详细页面“http://tieba.baidu.com/p/（替换为tid）”

在返回的html源码中找到我们需要的 fid，tbs，floor_num，rich_text 等参数的值，然后进行post提交即可。

首先借助jsoup解析某贴吧的首页，比如贝爷吧，代码如下：

//-----------水贴开始-----------------------
HttpClient client4 = new DefaultHttpClient();
HttpGet get = new HttpGet("http://tieba.baidu.com/f?kw=贝爷&fr=index");
HttpContext context4 = new BasicHttpContext();
context4.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpResponse finalResponse = client4.execute(get, context4);

String result = EntityUtils
.toString(finalResponse.getEntity(), "utf-8");
Document doc = Jsoup.parse(result);
Elements elements = doc.getElementsByTag("li"); //找出所有li标签
for (Element e : elements) {
String lingherf = e.attr("data-field");
if (lingherf.contains("\"reply_num\":0")) { //如果恢复数量是0的话，取出tid

final String tid = lingherf.substring(
lingherf.indexOf("\"id\":") + 5,
lingherf.indexOf(",\"reply_num"));
new GetHtmlThread(new MyCallBack() { //开启线程抢二楼

@Override
public void nextStep(String html) {
// TODO Auto-generated method stub
System.out.println(html);

}

}, "http://tieba.baidu.com/p/" + tid, cookieStore,tid).start();

}
}

然后在访问帖子详细页面：

DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet(URL); //这个url其实是上面开启线程传进来的那个url
HttpContext context = new BasicHttpContext();
context.setAttribute(ClientContext.COOKIE_STORE, store); //为了消除cookie的影响，一路保存cookie
try {
HttpResponse response = client.execute(get, context);
html = EntityUtils.toString(response.getEntity(), "utf-8");

// 取出我们所需要的参数，不会正则表达式哈，见笑。
String fid = html.substring(html.indexOf("fid:'"),
html.indexOf("fid:'") + 20);
fid = fid.substring(fid.indexOf("'") + 1, fid.lastIndexOf("'"));
String floor_num = html.substring(html.indexOf("floor_num:\""),
html.indexOf("floor_num:\"") + 20);
floor_num = floor_num.substring(floor_num.indexOf("\"") + 1,
floor_num.lastIndexOf("\""));
String rich_text = html.substring(html.indexOf("rich_text:'"),
html.indexOf("rich_text:'") + 20);
rich_text = rich_text.substring(rich_text.indexOf("'") + 1,
rich_text.lastIndexOf("'"));
String tbs = html.substring(html.indexOf("'tbs' : \""),
html.indexOf("'tbs' : \"") + 40);
tbs = tbs.substring(tbs.indexOf("\"") + 1, tbs.lastIndexOf("\""));
ArrayList<NameValuePair> para = new ArrayList<>();
para.add(new BasicNameValuePair("ie", "utf-8"));
para.add(new BasicNameValuePair("kw", "贝爷"));
para.add(new BasicNameValuePair("fid", fid));
para.add(new BasicNameValuePair("tid", tid));
para.add(new BasicNameValuePair("floor_num", floor_num));
para.add(new BasicNameValuePair("rich_text", rich_text));
para.add(new BasicNameValuePair("tbs", tbs));
para.add(new BasicNameValuePair("content", "我就路过抢个二楼，混个脸熟。"));

//鼠标的参数可以照着原来的参数反编码后原样传回去
para.add(new BasicNameValuePair(
"mouse_pwd",
"125,118,125,99,125,120,125,122,70,126,99,127,99,126,99,127,99,126,99,127,99,126,99,127,99,126,99,127,70,124,126,123,120,124,70,126,124,121,121,99,120,121,119,13885747519790"));
para.add(new BasicNameValuePair("mouse_pwd_t", "1388574751979"));
para.add(new BasicNameValuePair("vcode_md5", ""));
para.add(new BasicNameValuePair("flies", "[]"));
para.add(new BasicNameValuePair("mouse_pwd_isclick", "0"));
para.add(new BasicNameValuePair("__type__", "reply"));

// TODO Auto-generated method stub
HttpPost post = new HttpPost(
"http://tieba.baidu.com/f/commit/post/add");
DefaultHttpClient client1 = new DefaultHttpClient();
HttpEntity postBodyEnt;

postBodyEnt = new UrlEncodedFormEntity(para, "utf-8");
post.setEntity(postBodyEnt);
System.out.println(EntityUtils.toString(postBodyEnt));
HttpContext context1 = new BasicHttpContext();
context1.setAttribute(ClientContext.COOKIE_STORE, store);
HttpResponse response1 = client1.execute(post, context1);

由于贝爷吧没二楼抢，换了个贴吧，可以看到水贴结果：

到这里就结束了。最后在吼一吼我最喜欢的一句话！

技术因分享而强大！

7 条评论

qq_25340399 2015.01.15
用源代码调试的时候出现java.net.URISyntaxException: Illegal character in path at index 36错误提示，请问这个如何解决？代码没有修改过
- esuvf回复qq_25340399 2015.04.13
  [reply]qq_25340399[/reply] 不好意思。好久没上博客。帖子的URL拼接错误。因为百度改版了。以前的代码可以登录。但是抢二楼功能不能用了。

orrin 2014.04.06
首先谢谢楼主分享，我觉得步骤很详细，也按步骤去实现了一遍，但有个问题，不得其解，忘解答，谢谢，我的err_no=257,一直不成功，不知是百度升级，还是为什么？下面是返回结果： <!DOCTYPE html><html><head><meta http-equiv=Content-Type content="text/html; charset=UTF-8"></head><body><script> var href = decodeURIComponent("http:\/\/tieba.baidu.com\/tb\/static-common\/html\/pass\/v3Jump.html")+"?" var accounts = '&accounts=' href += "err_no=257&callback=parent.bd__pcbs__d2bmuv&codeString=captchaservice6339316532487a346e433935654e6555563141426f306e304f504a7a65395673365252733972393930314d79756d4b722b4c43584c624e416f356a386455446b725479664a7a4d624b43637a756c4864674b6878487167454a4a746c762b59656d35372f64524875624f4a4768394c50696a6450416c584636693775333055556a33592f54617478415863636b704d314c37465a447a394e58535a56444c7445497a542f62314f537a7874614838776763306247446142613353732f7a72386465467674344169793767674a3669696e717231456f70536d6a683869365756496f704b594f614253593465596b417558545148
- esuvf回复orrin 2014.04.06
  [reply]orrin[/reply] 代码已传网盘，你可以自己试试看http://pan.baidu.com/s/1o6uLGgU。

troyyu22 2014.02.05
从python转java，然后各种不懂，希望可以参考您的java源码，troyyu22@gmail.com，可以吗？
- esuvf回复troyyu22 2014.02.24
  [reply]troyyu22[/reply] http://pan.baidu.com/s/1dDrFaff

老萨 2014.02.04
帖子水的非常好，尤其是整个思路!!!再次感谢博主

那个叫啥好 2014.01.23
哥们给我也分享下源码呗？dynamiclly@163.com 谢谢了
- esuvf回复那个叫啥好 2014.02.24
  [reply]zaiyuzhongjuntaoli[/reply] http://pan.baidu.com/s/1dDrFaff

swuxd 2014.01.16
POST /f/commit/post/add HTTP/1.1 Host: tieba.baidu.com User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:20.0) Gecko/20100101 Firefox/20.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip, deflate Connection: keep-alive Content-Type: application/json; charset=utf-8 X-Requested-With: XMLHttpRequest Cookie: BAIDUID=3FDC3A46A825B7121BFF121632C14D13:FG=1; H_PS_PSSID=4845_1440_4264_4989_4489_4759_4677; BDUSS=3c4eG83LVVQTmI3T2dURGh-cGJhazJ。。。dSS; TIEBA_USERTYPE=737e8e03a15e9514cc177320; TIEBAUID=52dc7439a4c0f44dbb48a919; bdshare_firstime=1389853703755; showCardBeforeSign=1; head_skin_guide=1; fuwu_center_bubble=1; rpln_guide=1; wise_device=0 Content-Length: 659 ie=utf-8&kw=。。。。%E5%AD%A6&fid=817&tid=2817484141&vcode_md5=&floor_num=1&rich_text=1&tbs=a1d8244d927534bd1389857732&content=。。。&files=%5B%5D&mouse_pwd=。。。。&mouse_pwd_t=1389854299486&mouse_pwd_isclick=0&__type__=reply 提交上去就错受不了了以前还是可以的

X345885864 2014.01.14
代码太多有点难贴，从下开始网上复制就行了，能帮我看看吗，谢谢
- esuvf回复老萨 2014.03.01
  [reply]Sum_ck[/reply] 你在水贴的时候，首先要进入那个贴吧的首页，然后找到帖子，进入帖子，发表回复，整个过程有没有都带上登陆时返回的cookie？最好从头到尾用一个cookiestore实例（跟浏览器一样），我测试的时候也出现过那种情况，用同一个cookiestore就不会了，你试试咯。
- 老萨回复esuvf 2014.03.01
  [reply]dwheger[/reply] 还有{"no":274 这种错误
- 老萨回复esuvf 2014.03.01
  [reply]dwheger[/reply] 楼主，{"no":265,"err_code":230265,"error":"","data":{"autoMsg":"","fid":513194,"fname":"java","tid":2894504953,"is_login":0 登陆后这种回应 “is_login":0 这种未登陆上去到底是什么原因呢。我后来写的抢二楼也老是有这个问题。有时能登陆上去，有时就不行
- esuvf回复老萨 2014.02.28
  [reply]Sum_ck[/reply] 那个只是一个回调接口 public interface MyCallBack{ public abstract void nextStep(String s); }
- 老萨回复esuvf 2014.02.28
  [reply]dwheger[/reply] 楼主少个MyCallBack类额，少个文件吧，分享里
- esuvf回复老萨 2014.02.24
  [reply]Sum_ck[/reply] http://pan.baidu.com/s/1dDrFaff
- 老萨回复esuvf 2014.02.14
  [reply]dwheger[/reply] 能发下源码吗？谢谢楼主！ sumrise@qq.com
- esuvf回复X345885864 2014.01.16
  [reply]X345885864[/reply] 已发，请查收。
- X345885864回复esuvf 2014.01.14
  [reply]dwheger[/reply] 345885864@qq.com，谢谢
- esuvf回复X345885864 2014.01.14
  [reply]X345885864[/reply] 我不是用java的HttpURLConnection（这个我不知道怎么操作cookie），我获取cookie，token都是用apache的HttpClient的包。你的getCookie方法貌似拿不到cookie（我尝试过输出cookie），还有context3.setAttribute(ClientContext.COOKIE_STORE, cookieStore);里面的cookieStore不是String类型的。是CookieStore类型（apache包的）。我使用一个全局静态的cookiestore来依次进行模拟，如果需要的话可以留个邮箱，我把我这边的java文件发给你，你替换用户名密码即可。

X345885864 2014.01.11
哥们能分享下源代码吗？345885864@qq.com谢谢了，研究好久了
- esuvf回复X345885864 2014.01.13
  [reply]X345885864[/reply] 那个没关系的，你只要保持从头到尾的http请求都用同一个cookiestore实例就好了，不用转string的，就跟浏览器一样。
- X345885864回复esuvf 2014.01.12
  [reply]dwheger[/reply] 试了好久不知道怎么将获取到的String 类型的Cookie怎么变成 org.apache.http.CookieStore类型
- esuvf回复X345885864 2014.01.11
  [reply]X345885864[/reply] 我的代码都贴在上面了，你把你的百度用户名，密码替换进去应该就可以了。还不行的话可以贴一下你的全部代码。