DataX从入门到使用

DataX从入门到使用

1.下载

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
支持的数据源

DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、NOSQL、大数据计算系统都已经接入,目前支持数据如下图

类型数据源Reader(读)Writer(写)文档
RDBMS 关系型数据库MySQL读 、写
Oracle读 、写
SQLServer读 、写
PostgreSQL读 、写
DRDS读 、写
通用RDBMS(支持所有关系型数据库)读 、写
阿里云数仓数据存储ODPS读 、写
ADS
OSS读 、写
OCS读 、写
NoSQL数据存储OTS读 、写
Hbase0.94读 、写
Hbase1.1读 、写
MongoDB读 、写
Hive读 、写
无结构化数据存储 TxtFile读 、写
FTP读 、写
HDFS读 、写
Elasticsearch

2.解压即安装

 tar -xzvf datax.tar.gz

3.同时需要
System Requirements:

  1. Linux
  2. JDK(1.6以上,推荐1.6)
  3. Python(推荐Python2.6.X)
  4. Apache Maven 3.x (Compile DataX)

4.简单测试

$ vim test.json

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader",
                    "parameter": {
                        "sliceRecordCount": 10,
                        "column": [
                            {
                                "type": "long",
                                "value": "10"
                            },
                            {
                                "type": "string",
                                "value": "hello,dawson -DataX"
                            }
                        ]
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "encoding": "UTF-8",
                        "print": true
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": 5
            }
        }
    }
}

5.运行

$ cd  {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}

结果
2018-07-23 11:17:45.556 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2018-07-23 11:17:45.558 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2018-07-23 11:17:45.558 [main] INFO  JobContainer - DataX jobContainer starts job.
2018-07-23 11:17:45.559 [main] INFO  JobContainer - Set jobId = 0
2018-07-23 11:17:45.572 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2018-07-23 11:17:45.573 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2018-07-23 11:17:45.573 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2018-07-23 11:17:45.573 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2018-07-23 11:17:45.573 [job-0] INFO  JobContainer - Job set Channel-Number to 5 channels.
2018-07-23 11:17:45.574 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [5] tasks.
2018-07-23 11:17:45.575 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [5] tasks.
2018-07-23 11:17:45.593 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2018-07-23 11:17:45.600 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2018-07-23 11:17:45.601 [job-0] INFO  JobContainer - Running by standalone Mode.
2018-07-23 11:17:45.613 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [5] channels for [5] tasks.
2018-07-23 11:17:45.621 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2018-07-23 11:17:45.621 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2018-07-23 11:17:45.632 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
2018-07-23 11:17:45.635 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2018-07-23 11:17:45.638 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
2018-07-23 11:17:45.641 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
2018-07-23 11:17:45.643 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
10  hello,dawson -DataX
2018-07-23 11:17:45.745 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[110]ms
2018-07-23 11:17:45.746 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[105]ms
2018-07-23 11:17:45.747 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[104]ms
2018-07-23 11:17:45.747 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[116]ms
2018-07-23 11:17:45.747 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[109]ms
2018-07-23 11:17:45.748 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2018-07-23 11:17:55.622 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 1050 bytes | Speed 105B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.001s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2018-07-23 11:17:55.623 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2018-07-23 11:17:55.624 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2018-07-23 11:17:55.625 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2018-07-23 11:17:55.625 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2018-07-23 11:17:55.626 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /home/asiainfo/datax/hook
2018-07-23 11:17:55.628 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%


     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2018-07-23 11:17:55.629 [job-0] INFO  JobContainer - PerfTrace not enable!
2018-07-23 11:17:55.636 [job-0] INFO  StandAloneJobContainerCommunicator - Total 50 records, 1050 bytes | Speed 105B/s, 5 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.001s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2018-07-23 11:17:55.637 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2018-07-23 11:17:45
任务结束时刻                    : 2018-07-23 11:17:55
任务总计耗时                    :                 10s
任务平均流量                    :              105B/s
记录写入速度                    :              5rec/s
读出记录总数                    :                  50
读写失败总数                    :                   0

6.任务列表:
DataX读写配置 https://github.com/alibaba/DataX
1. oracle => mongo
2. oracle => DRDS

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值