HtmlUnit(一)Fix File handler bug

HtmlUnit(一)Fix File handler bug

MainPage
http://htmlunit.sourceforge.net/

document
http://www.w3.org/TR/html401/interact/forms.html#adef-tabindex

Our version is htmlunit2.6. After we used this opensource project, we came across a problem about 'file handler leak'.
Our system will throw 'too many open files' exception, and we are sure that it is caused by the TCP status CLOSE_WAIT. Our TCP status CLOSE_WAIT will increase from jboss start, and
it will never stop to a stable status.

The log from the server told us there were many connections URLs like this:
TCP yo-in-f190.ie100.net:http (CLOSE_WAIT)
TCP iad04s01-in-f99.ie100.net:http (CLOSE_WAIT)
TCP a72-246-208-9.deploy.akamaitechnologies.com:http (CLOSE_WAIT)
TCP a72-246-113-163.deploy.akamaitechnologies.com:http (CLOSE_WAIT)

we have done some changes about the server settings.
1. We modified the configuration of TCP_KEEPALIVE_TIME to a short value 1800 seconds.
I changed /etc/sysctl.conf by adding the following lines in the /etc/sysctl.conf and restart the newwork on my test Linux server.
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_intvl = 2
2. We increased the ulimit=8000 to handle more files.

But that did work well. It can not fix the problem. After that, we think it is the problem of htmlunit itself.

We improved some source codes of htmlunit2.6. Our changes follow:
1.configuration file
src/main/java/com/gargoylesoftware/htmlunit/http_connection_pool.properties, some configurations:
DEFAULT_MAX_CONNECTIONS_PER_HOST=50
#Timeout in milliseconds
CONNECTION_TIMEOUT=300000
SO_TIMEOUT=300000
MAX_TOTAl_CONNECTIONS=500
#RECEIVE_BUFFER_SIZE=65535
#SEND_BUFFER_SIZE=65535
DEFAULT_MAX_CONNECTIONS_PER_HOST=60000
IDLE_TIMEOUT=30000

2.Web Connection class
src/main/java/com/gargoylesoftware/htmlunit/HttpWebConnection.java
modify the method of create connection to make MultiThreadedHttpConnectionManager singleton in our system.
protected HttpClient createHttpClient(){
// final MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
final MultiThreadedHttpConnectionManager connectionManager = com.gargoylesoftware.htmlunit.MultiThreadedHttpConnectionManagerFactory
.getInstance();
HttpClient client = new HttpClient(connectionManager);
HostConfiguration hostConf = client.getHostConfiguration();
List<Header> headers = new ArrayList<Header>();
headers.add(new Header("Connection", "close"));
hostConf.getParams().setParameter("http.default-headers", headers);
return client;
}

3.Factory mode class to create MultiThreadedHttpConnectionManager
src/main/java/com/gargoylesoftware/htmlunit/MultiThreadedHttpConnectionManagerFactory.java:

package com.gargoylesoftware.htmlunit;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.params.HttpConnectionManagerParams;
import org.apache.commons.httpclient.util.IdleConnectionTimeoutThread;

public class MultiThreadedHttpConnectionManagerFactory
{
private static MultiThreadedHttpConnectionManager instance;

public static MultiThreadedHttpConnectionManager getInstance()
{
InputStream is = null;
HttpConnectionManagerParams param = null;
Properties prop = null;
if (null == instance)
{
synchronized (MultiThreadedHttpConnectionManagerFactory.class)
{
if (null == instance)
{
param = new HttpConnectionManagerParams();
is = MultiThreadedHttpConnectionManagerFactory.class.getResourceAsStream("http_connection_pool.properties");
prop = new Properties();
try
{
prop.load(is);
}
catch (IOException e)
{

e.printStackTrace();
}

param.setDefaultMaxConnectionsPerHost(Integer.parseInt(prop.getProperty("DEFAULT_MAX_CONNECTIONS_PER_HOST", "50")));
param.setSoTimeout(Integer.parseInt(prop.getProperty("SO_TIMEOUT", "30000")));
param.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
param.setMaxTotalConnections(Integer.parseInt(prop.getProperty("MAX_TOTAl_CONNECTIONS", "500")));

MultiThreadedHttpConnectionManager newM = new MultiThreadedHttpConnectionManager();
newM.setParams(param);
instance = newM;
// register a idleConnect time out
IdleConnectionTimeoutThread idleThread = new IdleConnectionTimeoutThread();
idleThread.setTimeoutInterval(1000 * 30);
idleThread.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
idleThread.addConnectionManager(instance);
idleThread.start();
}
}
}

return instance;
}
}

4. Right way to use htmlunit2.6-cusomer.jar
And after we finished call the webpages, we will do this clean work:
if (this.wc != null){
List<TopLevelWindow> windows = this.wc.getTopLevelWindows();
if (Log.isDebugEnabled(this)){
if (windows != null && !windows.isEmpty()){
for (int i = 0; i < windows.size(); i++){
TopLevelWindow window = windows.get(i);
History histories = window.getHistory();
for (int j = 0; j < histories.getLength(); j++){
URL url = histories.getUrl(j);
Log.info(this, "Window=" + window.getName() + " : url=" + url.toString());
}
}
}
}
this.wc.closeAllWindows();
this.wc = null;
}
The this.wc is short for WebClient. We have to call closeAllWindows() method according to the suggestion of the official website.

The most import changes are :
IdleConnectionTimeoutThread idleThread = new IdleConnectionTimeoutThread();
idleThread.setTimeoutInterval(1000 * 30);
idleThread.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
idleThread.addConnectionManager(instance);
idleThread.start();

We use a thread to watch the free connection and release them. This solved the problem. I hope this useful if you use htmlunit too.
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值