使用BufferedReader处理HttpURLConnection.getInputStream()出现阻塞的问题

在尝试通过HttpURLConnection访问百度百科并处理响应时,遇到BufferedReader的readLine()方法因网络原因导致的阻塞问题。文章探讨了无法预知数据长度、网络稳定性不确定以及设置超时解决方案的尝试与失败,并分享了使用HttpClient作为替代方案的成功经验。作者还计划使用HtmlUnit来解析动态渲染后的页面。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

业务流程:

我有一个语词列表,想查看在百度百科中是否有对应的词条。需要访问含有中文的指定URL。(题外说一句,由于URL中含有中文,直接访问会乱码,所以需要对中文部分进行编码解决。)由于百科对词条有大量的重定向(301、302等),所以也要对这部分处理。(这部分不是本文重点,所以忽略)。我使用BufferedReader包裹得到的输入流,但是由于readline()方法是阻塞方法。由于网络原因,可能会导致readline()无法得到终止符从而出现阻塞。

比如:

		URL url = new URL("https://baike.baidu.com/search/word?word=Lamy");
		HttpURLConnection httpUrlConn = (HttpURLConnection) url.openConnection();
		httpUrlConn.connect();		
		InputStream inputStream = httpUrlConn.getInputStream();
		InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "utf-8");
		BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
		StringBuilder sb = new StringBuilder();
		String str = "";
		while ((str = bufferedReader.readLine()) != null) {//readline()是阻塞方法,如果接受不到换行符等会抑制阻塞,导致程序停留在while所在行
			sb.append(str);
		}
		bufferedReader.close();
		inputStreamReader.close();
		inputStream.close();
		httpUrlConn.disconnect();
		String res = sb.toString();
网上之前很多人说将while循环去掉,只读取一次,或者服务器关闭套接字通知客户端的BufferedReader。但是,针对我目前的需求,是不合适的。

首先,无法确定从百度百科获取数据的长度,所以,无法只运行一次readline()完成功能。(网上的经验大多数是针对自己的服务器,客户端与服务器之间就数据长度已事先沟通好。)

其次,对于关闭套接字,由于网络不稳定,我无法确定是链接已断开还是传输速度过慢。

第三,针对网上设置超时选项的解决办法如:

httpUrlConn.setConnectTimeout(300);//设置连接超时
httpUrlConn.setReadTimeout(100);//设置建立连接后,到得到数据前的等待超时时间
对于我来说是无用的,因为出现的阻塞是在已经获取到一部分数据后,所以ReadTimeout无效。

void java.net.URLConnection.setReadTimeout(int timeout)


Sets the read timeout to a specified timeout, in milliseconds. A non-zero value specifies the timeout when reading from Input stream when a connection is established to a resource. If the timeout expires before there is data available for read, a java.net.SocketTimeoutException is raised. A timeout of zero is interpreted as an infinite timeout. 

Some non-standard implementation of this method ignores the specified timeout. To see the read timeout set, please call getReadTimeout().
Parameters:timeout an int that specifies the timeout value to be used in millisecondsThrows:IllegalArgumentException - if the timeout parameter is negativeSince:1.5

失败的方式参考的stackoverflowAlper Akture的答案,用另一个线程使用CountDownLatch监视数据读取线程,当latch.await(2000, TimeUnit.MILLISECONDS)因为超时而返回false时,就认为之前的读取数据线程的连接已经异常阻塞(或者是因为网络问题读取过于缓慢),将HttpURLConnection的数据流关闭,使BufferedReader退出阻塞状态。(注意,由于I/O和在synchronized块上的等待是不可中断的,所以直接调用Thread.interrupt()是无法退出bufferedreder的阻塞的,《Thinking in java》第四版第21章696页的例子可以说明这一点,在我的程序中其作用仅仅是通知接收数据过慢的线程数据无效而已。)之后再重新请求数据。

代码不用细看,因为不成功,哎)。

class Read implements Runnable {
	public String url;
	BlockingQueue<String> queue;
	private CountDownLatch latch;
	String word;
	InputStream inputStream = null;
	InputStreamReader inputStreamReader = null;
	BufferedReader bufferedReader = null;
	HttpURLConnection httpUrlConn = null;
		
	public Read(String s, BlockingQueue<String> queue, CountDownLatch latch,String word) {
		url = s;
		this.queue = queue;
		this.latch = latch;
		this.word = word;
	}

	public void run() {
		while (true) {
			StringBuffer buffer = new StringBuffer();			
			try {
				URL url = new URL(this.url);
				httpUrlConn = (HttpURLConnection) url.openConnection();
				httpUrlConn.setRequestProperty("User-agent",
						"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1");
				// httpUrlConn.setRequestProperty("accept-language", "zh-CN");
				// httpUrlConn.setRequestMethod();
				httpUrlConn.setConnectTimeout(300);
				httpUrlConn.setReadTimeout(100);

				System.out.println("Request URL ... " + URLDecoder.decode(this.url, "utf-8"));// 41

				boolean redirect = false;

				// normally, 3xx is redirect
				int status = httpUrlConn.getResponseCode();
				if (status != HttpURLConnection.HTTP_OK) {
					if (status == HttpURLConnection.HTTP_MOVED_TEMP || status == HttpURLConnection.HTTP_MOVED_PERM
							|| status == HttpURLConnection.HTTP_SEE_OTHER)
						redirect = true;
				}

				System.out.println("Response Code ... " + status);

				while (redirect) {//处理重定向部分
					// get redirect url from "location" header field
					String newUrl = httpUrlConn.getHeaderField("Location");
					// get the cookie if need, for login
					// String cookies =
					// httpUrlConn.getHeaderField("Set-Cookie");
					// open the new connnection again
					httpUrlConn = (HttpURLConnection) new URL(newUrl).openConnection();
					// httpUrlConn.setRequestProperty("Cookie", cookies);
					httpUrlConn.setRequestProperty("User-agent",
							"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.215 Safari/535.1");
					System.out.println("Redirect to URL : " + URLDecoder.decode(newUrl, "utf-8"));
					status = httpUrlConn.getResponseCode();
					if (status != HttpURLConnection.HTTP_OK) {
						if (status == HttpURLConnection.HTTP_MOVED_TEMP || status == HttpURLConnection.HTTP_MOVED_PERM
								|| status == HttpURLConnection.HTTP_SEE_OTHER)
							redirect = true;
					} else {
						redirect = false;
					}
					System.out.println("Response Code ... " + status);
				}
				httpUrlConn.connect();
				inputStream = httpUrlConn.getInputStream();
				inputStreamReader = new InputStreamReader(inputStream, "utf-8");
				bufferedReader = new BufferedReader(inputStreamReader);
				int countline = 0;
				String str = null;
				while ((str = bufferedReader.readLine()) != null) {
					countline++;
					str = bufferedReader.readLine();
					buffer.append(str);
					if (countline > 100)//业务需要,我的目的是解析部分网页数据,因此不需要网页的全文,金宝村100行即可
						break;
				}				
				String res = buffer.toString();
				if(!Thread.interrupted()){
					queue.add(res);
					System.out.println("queue has add : "+word);
				}				
				latch.countDown();
				break;
			} catch (SocketTimeoutException e) {
				System.err.println("Socket time out!");
				continue;				
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			} finally{
				try {
					if(bufferedReader!=null)
					bufferedReader.close();
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
				try {
					if(inputStreamReader!=null)
					inputStreamReader.close();
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}

				try {
					if(inputStream!=null)
					inputStream.close();
				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
				httpUrlConn.disconnect();
			}
		}
	}
}

监督读取数据的部分程序,放在另一个线程中,那个queue是之前定义的一个阻塞队列,用于数据处理的,与业务相关:

			boolean success = false;
			int temp0 = 0;
			while (!success) {
				System.out.println("**************in new location************");
				CountDownLatch latch = new CountDownLatch(1);
				Read r = new Read(str, queue, latch,word);
				Thread t =new Thread(r);
				t.setName(word+"-"+temp0++);
				t.start();
				System.out.println("Thread:"+t.getName()+" start");
				try {
					success = latch.await(2000, TimeUnit.MILLISECONDS);
					if(!success){
						t.interrupt();
						r.httpUrlConn.getInputStream().close();				
						System.out.println(t.getName()+" call interrupt");
						continue;
					}
				} catch (InterruptedException e) {
					// TODO Auto-generated catch block
					//e.printStackTrace();
					System.err.println("latch.await:InterruptedException");
				} catch (IOException e) {
					// TODO Auto-generated catch block
					//e.printStackTrace();
					System.err.println("httpUrlConn.getInputStream().close() Exception");
				}
			}

失败原因:读取数据的线程会保持inputstream的Lock,导致监控线程试图关闭inputstream时也被阻塞,导致双双进入阻塞状态。使用JDK环境变量路径下/bin文件夹下的Jconsole工具可以观察到这一现象。(使用很简单,如Window下,双击打开,连接到先观察的线程,连接即可)。


成功的方式:不使用URLConnection,而使用HttpClient。

十分感谢:struggleee_luo 的博客,点击可以查看其博客,我要干的事情和他类似,而且遇到的问题也很像。下面包含重定向部分,如果不需要可以删掉。

代码:

	public static String staticDownloadByHttpClient(String urlstr, String encoding, String param) {
		String bufferStr = null;

		// 创建带有重定向功能的Http客户端,使用已有工具类
		HttpClientBuilder builder = HttpClients.custom().disableAutomaticRetries() // 关闭自动处理重定向
				.setRedirectStrategy(new LaxRedirectStrategy());// 利用LaxRedirectStrategy处理POST重定向问题

		CloseableHttpClient httpclient = builder.build();

		// 创建默认的httpClient实例.
		// CloseableHttpClient httpclient = HttpClients.createDefault();

		// 创建httppost
		HttpPost httppost = new HttpPost(urlstr);

		// 设置套接字超时时间!!!!!!!
		RequestConfig requestConfig = RequestConfig.custom().setSocketTimeout(6000).setConnectTimeout(6000).build();// 设置请求和传输超时时间
		httppost.setConfig(requestConfig);

		// 创建参数队列
		List<NameValuePair> formparams = new ArrayList<NameValuePair>();
		String name = param.split("=")[0];
		String value = param.split("=")[1];
		formparams.add(new BasicNameValuePair(name, value));
		UrlEncodedFormEntity uefEntity;
		try {
			uefEntity = new UrlEncodedFormEntity(formparams, "UTF-8");
			httppost.setEntity(uefEntity);

			CloseableHttpResponse response = httpclient.execute(httppost);

			if (response == null) {
				httpclient.close();
				return bufferStr;
			}

			try {
				HttpEntity entity = response.getEntity();
				if (entity != null) {
					// 不设置读取超时会导致词语局阻塞
					bufferStr = EntityUtils.toString(entity, encoding);
				}
				try {
					EntityUtils.consume(entity);
				} catch (final IOException ignore) {
				}
			} finally {
				response.close();
			}
		} catch (ClientProtocolException e) {
			e.printStackTrace();
		} catch (UnsupportedEncodingException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			// 关闭连接,释放资源
			try {
				httpclient.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return bufferStr;
	}

由于百度的页面是动态渲染的,直接解析得到的HTML会丢失一些信息,下一步准备使用工具HtmlUnit对渲染后的页面在进行解析。

希望可以有更好的方法指正。

再次感谢:

struggleee_luo

http://blog.youkuaiyun.com/u010695420/article/details/53898526


package com.example.myapplication; import android.util.Log; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; import java.util.logging.Level; import java.util.logging.Logger; public class NetUtil { public static final String URL_BEMFA = "https://apis.bemfa.com/va/getmsg?uid=c5d604eb9c3d46d5803b4a6d0dd427bc&topic=sensors&type=3"; public static String doGet(String urlStr) { String result = ""; HttpURLConnection connection = null; InputStreamReader inputStreamReader = null; BufferedReader bufferedReader = null; // 连接网络 try { URL url = new URL(urlStr); connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); // Log.d("NetUtil", "Request Method: " + connection.getRequestMethod()); connection.setConnectTimeout(5000); // 获取响应码 int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // 从连接中读取数据(二进制) InputStream inputStream = connection.getInputStream(); inputStreamReader = new InputStreamReader(inputStream); // 二进制流送入缓冲区 bufferedReader = new BufferedReader(inputStreamReader); // 从缓存区中一行行读取字符串 StringBuilder stringBuilder = new StringBuilder(); String line = ""; while ((line = bufferedReader.readLine()) != null) { stringBuilder.append(line); } result = stringBuilder.toString(); } else { Log.e("NetUtil", "HTTP request failed with status code: " + responseCode); } } catch (Exception e) { e.printStackTrace(); Log.e("NetUtil", "HTTP request failed: " + e.getMessage()); } finally { if (connection != null) { connection.disconnect(); } if (inputStreamReader != null) { try { inputStreamReader.close(); } catch (IOException e) { e.printStackTrace(); } } if (bufferedReader != null) { try { bufferedReader.close(); } catch (IOException e) { e.printStackTrace(); } } } return result; } public static String getBemfaData() { // 拼接出获取数据的URL String bemfaUrl = URL_BEMFA; Log.d("fan", "----bemfaUrl----" + bemfaUrl); String bemfaResult = doGet(bemfaUrl); Log.d("fan", "----bemfaResult----" + bemfaResult); return bemfaResult; } public static String doPost(String urlStr, String json) { String result = ""; HttpURLConnection connection = null; OutputStream outputStream = null; try { URL url = new URL(urlStr); connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("POST"); connection.setDoOutput(true); connection.setRequestProperty("Content-Type", "application/json"); connection.setConnectTimeout(5000); outputStream = connection.getOutputStream(); outputStream.write(json.getBytes()); outputStream.flush(); int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { InputStream inputStream = connection.getInputStream(); InputStreamReader inputStreamReader = new InputStreamReader(inputStream); BufferedReader bufferedReader = new BufferedReader(inputStreamReader); StringBuilder stringBuilder = new StringBuilder(); String line; while ((line = bufferedReader.readLine()) != null) { stringBuilder.append(line); } result = stringBuilder.toString(); } else { Logger.getLogger(NetUtil.class.getName()).log(Level.SEVERE, "HTTP request failed with status code: {0}", responseCode); } } catch (Exception e) { Logger.getLogger(NetUtil.class.getName()).log(Level.SEVERE, "HTTP request failed: {0}", e.getMessage()); } finally { if (connection != null) { connection.disconnect(); } if (outputStream != null) { try { outputStream.close(); } catch (IOException e) { Logger.getLogger(NetUtil.class.getName()).log(Level.SEVERE, "Failed to close output stream", e); } } } return result; } } E HTTP request failed with status code: {0}怎么解决
最新发布
06-21
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值