前序
参考文档:
datax 官网中的DataX插件开发宝典
一、环境准备
- jdk1.8 及以上
- idea 工具
- git 工具
一、项目搭建
1、用idea下载源码
打开idea 选择 File->New -> Project from Version Control…

github地址:https://github.com/alibaba/DataX.git
在URL:填写gihub地址并Clone

2.等待加载相应文件…
3.Engine.java 启动文件
修改main方法,增加两行代码
//设置系统参数,参数值:datax编译后的目录
System.setProperty(“datax.home”, “D:\work\DataX\target\datax\datax”);
//设置启动参数:job的json文件的路径,和其他参数
String[] datxArgs = {"-job", “D:\work\DataX\core\src\main\job\mongo-file.json”, “-mode”, “standalone”, “-jobid”, “-1”};
public static void main(String[] args) throws Exception {
System.setProperty("datax.home", "D:\\work\\DataX\\target\\datax\\datax"); //datax编译后的目录
String[] datxArgs = {
"-job", "D:\\work\\DataX\\core\\src\\main\\job\\mongo-file.json", "-mode", "standalone", "-jobid", "-1"}; //自己的json文件路径
int exitCode = 0;
try {
Engine.entry(datxArgs);
} catch (Throwable e) {
exitCode = 1;
LOG.error("\n\n经DataX智能分析,该任务最可能的错误原因是:\n" + ExceptionTracker.trace(e));
if (e instanceof DataXException) {
DataXException tempException = (DataXException) e;
ErrorCode errorCode = tempException.getErrorCode();
if (errorCode instanceof FrameworkErrorCode) {
FrameworkErrorCode tempErrorCode = (FrameworkErrorCode) errorCode;
exitCode = tempErrorCode.toExitValue();
}
}
System.exit(exitCode);
}
System.exit(exitCode);
}
4.检查DataX 父目录下面pom.xml,注释掉无用的插件模块
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-all</artifactId>
<version>0.0.1-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-core</artifactId>
<version>1.3</version>
</dependency>
</dependencies>
<name>datax-all</name>
<packaging>pom</packaging>
<properties>
<jdk-version>1.8</jdk-version>
<datax-project-version>0.0.1-SNAPSHOT</datax-project-version>
<commons-lang3-version>3.3.2</commons-lang3-version>
<commons-configuration-version>1.10</commons-configuration-version>
<commons-cli-version>1.2</commons-cli-version>
<fastjson-version>1.1.46.sec01</fastjson-version>
<guava-version>16.0.1</guava-version>
<diamond.version>3.7.2.1-SNAPSHOT</diamond.version>
<!--slf4j 1.7.10 和 logback-classic 1.0.13 是好基友 -->
<slf4j-api-version>1.7.10</slf4j-api-version>
<logback-classic-version>1.0.13</logback-classic-version>
<commons-io-version>2.4</commons-io-version>
<junit-version>4.11</junit-version>
<tddl.version>5.1.22-1</tddl.version>
<swift-version>1.0.0</swift-version>
<project-sourceEncoding>UTF-8</project-sourceEncoding>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<maven.compiler.encoding>UTF-8</maven.compiler.encoding>
</properties>
<modules>
<module>common</module>
<module>core</module>
<module>transformer</module>
<!-- reader -->
<!-- <module>mysqlreader</module>-->
<!-- <module>drdsreader</module>-->
<!-- <module>sqlserverreader</module>-->
<!-- <module>postgresqlreader</module>-->
<!-- <module>oraclereader</module>-->
<!-- <module>odpsreader</module>-->
<!-- <module>otsreader</module>-->
<!-- <module>otsstreamreader</module>-->
<!-- <module>txtfilereader</module>-->
<!-- <module>hdfsreader</module>-->
<!-- <module>streamreader</module>-->
<!-- <module>ossreader</module>-->
<!-- <module>ftpreader</module>-->
<module>mongodbreader</module>
<!-- <module>rdbmsreader</module>-->
<!-- <module>hbase11xreader</module>-->
<!-- <module>hbase094xreader</module>-->
<!-- <module>tsdbreader</module>-->
<!-- <module>opentsdbreader</module>-->
<!-- <module>cassandrareader</module>-->
<!-- <module>gdbreader</module>-->
<!-- writer -->
<!-- <module>mysqlwriter</module>-->
<!-- <module>drdswriter</module>-->
<!-- <module>odpswriter</module>-->
<module>txtfilewriter</module>
<!-- <module>ftpwriter</module>-->
<module>hdfswriter</module>
<module>streamwriter</module>
<!-- <module>otswriter</module>-->
<!-- <module>oraclewriter</module>-->
<!-- <module>sqlserverwriter</module>-->
<!-- <module>postgresqlwriter</module>-->
<!-- <module>osswriter</module>-->
<!-- <module>mongodbwriter</module>-->
<!-- <module>adswriter</module>-->
<!-- <module>ocswriter</module>-->
<!-- <module>rdbmswriter</module>-->
<!-- <module>hbase11xwriter</module>-->
<!-- <module>hbase094xwriter</module>-->
<!-- <module>hbase11xsqlwriter</module>-->
<!-- <module>hbase11xsqlreader</module>-->
<!-- <module>elasticsearchwriter</module>-->
<!-- <module>tsdbwriter</module>-->
<!-- <module>adbpgwriter</module>-->
<!-- <module>gdbwriter</module>-->
<!-- <module>cassandrawriter</module>-->
<!-- <module>clickhousewriter</module>-->
<!-- common support module -->
<!-- <module>plugin-rdbms-util</module>-->
<!-- <module>plugin-unstructured-storage-util</module>-->
<!-- <module>hbase20xsqlreader</module>-->
<!-- <module>hbase20xsqlwriter</module>-->
</modules>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>${commons-lang3-version}</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>${fastjson-version}</version>
</dependency>
<!--<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava-version}</version>
</dependency>-->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>${commons-io-version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j-api-version}</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>

本文详细介绍了如何在DataX框架中自定义开发MongoDB Reader插件,以解决读取文档和数组类型数据的问题。通过修改源码,将未识别的Document和List类型转换为JSON字符串,确保数据能被正确处理。
最低0.47元/天 解锁文章
27

被折叠的 条评论
为什么被折叠?



