【日常问题】jsoup设置代理报错

问题:使用jsoup爬虫设置代理报错。
解决:端口原来为int类型,改为String类型即可。

@Test
public void  internetTest() throws Exception{
	System.setProperty("http.maxRedirects", "50");
	System.getProperties().setProperty("proxySet", "true");
	System.getProperties().put("https.proxyHost", "proxy.piccnet.com.cn");  
	System.getProperties().put("https.proxyPort", "3128");//注意端口为String类型。
	
	String  agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/56.0.2924.87 Safari/537.36" ;  
	Document doc = Jsoup.connect("https://www.zhihu.com/question/21832486")
	        .userAgent(agent)
	        .ignoreHttpErrors(true)//这个很重要 否则会报HTTP error fetching URL. Status=404
	        .timeout(3000).get();  
	if (doc!=null) {
		System.err.println(doc.body().html());  
	}
}
java.net.SocketException: Unexpected end of file from server
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:782)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:641)
    at sun.net.www.protocol.http.HttpURLConnection.doTunneling(HttpURLConnection.java:1618)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:164)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:133)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:425)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
    at jsoup.JavaInterviewQuestions.JavaInterviewTi.internetTest(JavaInterviewTi.java:47)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Jsoup是一个流行的用于抓取HTML和XML文档的Java库,它本身并不直接支持代理设置。但是,在使用Jsoup进行网络请求时,如果你需要通过代理服务器获取网页内容,通常会涉及到HTTP客户端的配置。 首先,你需要创建一个支持代理的HTTP客户端,例如使用OkHttp或者Apache HttpClient。以下是一个简单的示例,展示如何使用OkHttp设置代理: ```java import okhttp3.OkHttpClient; import okhttp3.Request; import okhttp3.Response; // 创建OkHttpClient实例并设置代理 OkHttpClient client = new OkHttpClient.Builder() .proxy(new Proxy(Proxy.Type.HTTP, "your_proxy_host:your_proxy_port")) // 你的代理主机和端口 .build(); // 使用client发起请求 Request request = new Request.Builder().url("http://example.com").build(); Response response = client.newCall(request).execute(); ``` 这里`"your_proxy_host"`和`"your_proxy_port"`需要替换为你实际的代理服务器地址和端口号。 对于Apache HttpClient,设置代理的方式类似: ```java import org.apache.http.HttpHost; import org.apache.http.client.config.RequestConfig; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; // 创建HttpClient实例并设置代理 CloseableHttpClient httpclient = HttpClients.custom() .setProxy(new HttpHost("your_proxy_host", your_proxy_port, "http")) .build(); HttpGet httpget = new HttpGet("http://example.com"); RequestConfig config = RequestConfig.custom() .setProxy(new HttpHost("your_proxy_host", your_proxy_port, "http")) .build(); httpget.setConfig(config); try (CloseableHttpResponse response = httpclient.execute(httpget)) { // 处理响应... } ``` 记得将上述代码中的`your_proxy_host`和`your_proxy_port`替换为实际的代理设置值。同时,确保你的环境允许进行网络连接,并遵守相关的法律法规和网站Robots.txt协议。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值