起源:前段时间,应一个朋友的委托,想在高考前搞一份往年各大高校对不同地区的文理科分数线,搞一个查询分数的公众号,在爬取网站的数据时由于对网站的请求太频繁,在访问了2000多次后被网站把我的ip 封了,于是就起了自己打造一款ip代理工具的心思。
准备工作
最近使用idea 比较多,所有新开启一个springboot 项目,引入redis 存储IP地址,找到一个提供ip代理的网站找找他们对外提供的ip集 ,我找的是一个叫做快代理的网站,地址https://www.kuaidaili.com/free/inha/ 准备工作好了 ,下面进入正题
引入必备的依赖包 ,配置springBoot 项目的application.yml
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.47</version>
</dependency>
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger2</artifactId>
<version>2.7.0</version>
</dependency>
<dependency>
<groupId>io.springfox</groupId>
<artifactId>springfox-swagger-ui</artifactId>
<version>2.7.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
spring:
redis:
host: 127.0.0.1
password:
port: 6379
timeout: 500
database: 3
pool:
max-active: 200
max-wait: -1
max-idle: 10
min-idle: 0
server:
port: 1024
logging:
file: ProxyRoot.log
system:
url:
kdl: https://www.kuaidaili.com/free/inha/
主要模块
java 中在爬取网站数据的时候 我个人还是偏向于使用Jsoup 工具 ,下面是我经常用的一个小工具
package com.cwwt.root.utls;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
import java.util.Map;
/**
* @author Mr.C
* @Description
* @create 2018/10/24 16:54
* Copyright: Copyright (c) 2018
*Company:CWWT
*/
public class JSoupUtils {
public static Connection.Response sendJsonPost(String url, String jsonData){
Connection.Response response=null;
try {
Connection connection = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")
.header("Content-Type", "application/json;charset=UTF-8")
.header("Accept", "text/plain, */*; q=0.01")
.header("Accept-Encoding", "gzip,deflate,sdch")
.header("Accept-Language", "es-ES,es;q=0.8")
.header("Connection", "keep-alive")
.header("X-Requested-With", "XMLHttpRequest")
.ignoreContentType(true)
.requestBody(jsonData)
.maxBodySize(0)
.timeout(1000 * 15)
.method(Connection.Method.POST);
response = connection.execute();
} catch (IOException e) {