记录几种抓取网页数据的办法,就是已知一个网页的域名,获取网页内容为一个String字符串或者Document对象。
第一种:urlConnection,通过url类的openConnection()方法,得到一个HttpURLConnection对象。通过InputStreamReader将整个网页内容转为String字符串。
URL url = new URL(Url);
HttpURLConnection httpConn = (HttpURLConnection)url.openConnection();
if(httpConn.getResponseCode() == HttpURLConnection.HTTP_OK)
{
Log.d("TAG", "---into-----urlConnection---success--");
InputStreamReader isr = new InputStreamReader(httpConn.getInputStream(), "utf-8");
int i;
String content = "";
while((i = isr.read()) != -1)
{
content = content + (char)i;
}
isr.close();
httpConn.disconnect();
}else
{
Log.d("TAG", "---into-----urlConnection---fail--");
}第二种:httpClient ,将url放入一个httpget对象中,然后用httpClient的excute方法去得到httpResponse的对象或者直接得到网页的String串。
DefaultHttpClient httpClinet = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(Url);
ResponseHandler<String> responseHandler = new BasicResponseHandler();
try {
String content = httpClinet.execute(httpGet, responseHandler);
//HttpResponse resp = httpClinet.execute(httpGet);
} catch (ClientProtocolException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}第三种:使用Jsoup:Document doc = Jsoup.parse(new URL("http://www.baidu.com"), 5000);
本文介绍了在Android中使用urlConnection进行网页内容抓取的方法,通过HttpURLConnection对象结合InputStreamReader,将网页内容转化为字符串进行处理。
1550

被折叠的 条评论
为什么被折叠?



