本来用某鱼抓取整理网站url的,结果超过一万条要付费充会员导出,有点郁闷,怎么办?因为java爬虫还不怎么会,就只有拼接了,想想就是查库然后转换成xml,勤快点自己动手。
获取网站url的后缀地址,一般都是id主键,先获取id,然后进行字符串拼接,最后输出成xml,这里采用springboot+mybatis+xStream。
引入依赖:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.booy</groupId>
<artifactId>url</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>war</packaging>
<!--引入springboot父版本-->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.2.RELEASE</version>
<relativePath/>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--mybatis包-->
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>1.3.1</version>
</dependency>
<!-- mysql驱动包 -->