终于撸完了所有的期末设计,看到贴吧里有很多人求httpclient的用法,所以就来一发获取指定贴吧用户回复帖子和主题的小Demo
使用 HttpClient 需要以下 6 个步骤:
1. 创建 HttpClient 的实例
2. 创建某种连接方法的实例,在这里是GetMethod。在 GetMethod 的构造函数中传入待连接的地址
3. 调用第一步中创建好的实例的 execute 方法来执行第二步中创建好的 method 实例
4. 读 response
5. 释放连接。无论执行方法是否成功,都必须释放连接
6. 对得到后的内容进行处理
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class demo {
public static void main (String[]args)throws ClientProtocolException, IOException{
CloseableHttpClient httpclient = HttpClients.createDefault();
String url="http://tieba.baidu.com/f/search/ures?ie=utf-8&kw=&qw=&rn=10&un="+"D8吧务组"+"&sm=1&pn="+"1";//贴吧user搜索
HttpGet httpget = new HttpGet(url);
System.out.println(httpget);
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
String html = EntityUtils.toString(entity);
//匹配
String reg = "<span class=\"p_title\">(.*?)</span>";
String content = "<div class=\"p_content\">(.*?)</div>";
String tieba = "贴吧:(.*?)作者";
String date = "<font class=\"p_green p_date\">(.*?)</font>";
Pattern r = Pattern.compile(reg, Pattern.DOTALL);
Matcher mr = r.matcher(html);
Pattern c = Pattern.compile(content, Pattern.DOTALL);
Matcher mc = c.matcher(html);
Pattern t = Pattern.compile(tieba, Pattern.DOTALL);
Matcher mt = t.matcher(html);
Pattern d = Pattern.compile(date, Pattern.DOTALL);
Matcher md = d.matcher(html);
while (mr.find()) {
System.out.println(mr.group(1));
while (mc.find()) {
if (mc.group(1).equals("")) {
System.out.println("表情");
} else {
System.out.println(mc.group(1));
}
while (mt.find()) {
System.out.println(mt.group(1));
while (md.find()) {
System.out.println(md.group(1));
break;
}
break;
}
break;
}
}
}
}
输出结果:
GET http://tieba.baidu.com/f/search/ures?ie=utf-8&kw=&qw=&rn=10&un=D8吧务组&sm=1&pn=1 HTTP/1.1
<a class="bluelink" href="/p/3857935662?pid=70599730554&cid=#70599730554" class="bluelink" target="_blank" >回复:求推荐一些跑步的歌曲 吧务求别删啊</a>
电音十三
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-06-28 21:40
<a class="bluelink" href="/p/3811553417?pid=69766105491&cid=#69766105491" class="bluelink" target="_blank" >回复:【一朝做流氓,十年挂南墙】2015第193期南墙公示贴</a>
7级以下(包括7级)
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-06-12 12:39
<a class="bluelink" href="/p/3810857469?pid=69550565456&cid=#69550565456" class="bluelink" target="_blank" >回复:【投票】关于帝吧是否应该开神兽,大家的态度</a>
回复 76780492 :你好,神兽以前24小时一直在开着,请管好你的嘴。
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-06-08 08:24
<a class="bluelink" href="/p/3810477485?pid=69473091247&cid=#69473091247" class="bluelink" target="_blank" >回复:帝吧多年潜水 。小女子明高考啦</a>
加油
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-06-06 20:13
<a class="bluelink" href="/p/3806712966?pid=69416964825&cid=#69416964825" class="bluelink" target="_blank" >回复:女屌除了考研就没出路了吗</a>
考研加油,看清形势
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-06-05 18:03
<a class="bluelink" href="/p/3796557297?pid=69150944200&cid=#69150944200" class="bluelink" target="_blank" >回复:【求安慰】男朋友竟然不要我了,原因是。。。</a>
回复 不经意被人敬仰 :骂吧务过瘾吗
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-05-30 22:46
<a class="bluelink" href="/p/3767546498?pid=68784662527&cid=#68784662527" class="bluelink" target="_blank" >回复:〔听听你的声音〕多年以后愿我提着老酒愿你还是老友</a>
唱吧
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-05-22 20:25
<a class="bluelink" href="/p/3748403153?pid=68132641929&cid=#68132641929" class="bluelink" target="_blank" >回复:【招募】帝吧游戏玩家最大组群招募成员</a>
回复 神波多一花丶 :重新加一下
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-05-08 21:31
<a class="bluelink" href="/p/3748403153?pid=68074596130&cid=#68074596130" class="bluelink" target="_blank" >回复:【招募】帝吧游戏玩家最大组群招募成员</a>
欢迎各位游戏爱好者踊跃加入!
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-05-07 16:31
<a class="bluelink" href="/p/3733492772?pid=67986945185&cid=#67986945185" class="bluelink" target="_blank" >回复:【一朝做流氓,十年挂南墙】2015第180期南墙公示贴</a>
深藏着功与名- 11级 广告翰侨牙兹酝 ...
<a class="p_forum" href="/f?kw=%C0%EE%D2%E3" target="_blank"><font class="p_violet">李毅</font></a>
2015-05-05 19:51
demo中使用的jar是4.4.1版本,尽量选择4.3以上版本,否则语法可能出现很大出入.
本文演示了如何使用HttpClient获取指定贴吧用户的所有回复帖子和主题,并通过正则表达式解析HTML内容。

被折叠的 条评论
为什么被折叠?



