HttpClient获取网页数据与模拟浏览器发送请求获取数据

本文介绍了两种抓取JSON网页数据的方法,一是使用HttpClient模拟GET请求,二是通过HttpsURLConnection模拟浏览器请求。详细展示了如何设置请求头,包括接受类型、连接方式、内容类型等,以及如何读取响应并保存到本地文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近要将JSON网页数据保存下来,找到了两种形式:

第一种:模拟get请求获取数据,按实际服务器端要求选用 Post 或 Get 请求方式

             HttpClient httpClient = new HttpClient();
             GetMethod getMethod = new GetMethod(dataUrlLocation);
            
//            getMethod.addRequestHeader("Accept", "*/*");
//            getMethod.addRequestHeader("Connection", "keep-alive");
            //设置格式为json
            getMethod.addRequestHeader("Content-Type", "application/json; charset=UTF-8");
//            getMethod.addRequestHeader("Cookie", cookies[cookies.length-1].toString());
//            getMethod.addRequestHeader("Referer", "https://data.okc.gov/portal/page/bingmap?datasetName=Work%20Zones&mapScale=11");
//            getMethod.addRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36");
//            getMethod.addRequestHeader("X-Requested-With", "XMLHttpRequest");
         
            httpClient.executeMethod(getMethod);            //模拟get请求
            String text = getMethod.getResponseBodyAsString();
       
            int BUF_SIZE = 1024;    //写入文件
            String unzipped_file = DATA_FILE;
            out_stream = new BufferedOutputStream(
                    new FileOutputStream(unzipped_file), BUF_SIZE);
            byte[] input_buffer = new byte[BUF_SIZE];
            int byteLength = 0;
            
            input_buffer = text.getBytes();
            byteLength = input_buffer.length;
            out_stream.write(input_buffer, 0, byteLength);

第二种:模拟浏览器发送请求

  	    	HttpsURLConnection conn = (HttpsURLConnection) new URL(dataUrlLocation).openConnection();
//			conn.setSSLSocketFactory(new TCITLSSocketConnectionFactory());
			conn.setConnectTimeout((int) retryWaitTime);
			conn.setReadTimeout((int) retryWaitTime);
			conn.setDoInput(true);
			conn.setDoOutput(true);
			conn.setRequestProperty("Accept", "*/*");
//			conn.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
//			conn.setRequestProperty("Accept-Language", "zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7");
			conn.setRequestProperty("Connection", "keep-alive");
//			conn.setRequestProperty("Content-Length", param.getBytes().length + "");
			conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
			conn.setRequestProperty("Cookie", "cookies[cookies.length-1]");
//			conn.setRequestProperty("Host", "transview.org");
//			conn.setRequestProperty("Origin", "https://transview.org");
			conn.setRequestProperty("Referer", "https://data.okc.gov/portal/page/bingmap?datasetName=Work%20Zones&mapScale=11");
			conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3573.0 Safari/537.36");
			conn.setRequestProperty("X-Requested-With", "XMLHttpRequest");
			InputStream content = (InputStream) conn.getContent();
			int BUF_SIZE = 1024;
			
			String unzipped_file = DATA_FILE;
			out_stream = new BufferedOutputStream(
					new FileOutputStream(unzipped_file), BUF_SIZE);
			byte[] input_buffer = new byte[BUF_SIZE];
			int byteLength = 0;
			while ((byteLength = content.read(input_buffer, 0, BUF_SIZE)) > 0) {
				out_stream.write(input_buffer, 0, byteLength);
			}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值