windows上datax的使用记录

datax使记录

简介
https://github.com/alibaba/DataX
DataX 是阿里云 DataWorks数据集成 的开源版本,在阿里巴巴集团内被广泛使用的离线数据同步工具/平台。DataX 实现了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各种异构数据源之间高效的数据同步功能。

1、下载datax

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

2、下载python2.7

https://www.python.org/ftp/python/2.7/python-2.7.msi
安装之后设置环境变量
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
打开cmd窗口测试是否安装成功
在这里插入图片描述

3、测试datax安装是否正确

进到安装目录bin,执行以下命令

D:\softwares\datax\bin>python datax.py ../job/myTest.json

报如下错
在这里插入图片描述
经百度,找到了解决办法,https://blog.51cto.com/u_13372349/5319772

删除
D:\softwares\datax\plugin\reader
D:\softwares\datax\plugin\writer
下的隐藏文件解决了问题

4、测试datax

(1) stream -> stream

在D:\softwares\datax\job 文件夹下新建一个json文件
stream2stream.json,文件内容如下

{
	 "job": {
	 "content": [
			 {
			 "reader": {
			 	 "name": "streamreader",
				 "parameter": {
				 "sliceRecordCount": 10,
				 "column": [
					 {
					 "type": "long",
					 "value": "10"
					 },
					 {
					 "type": "string",
					 "value": "hello,DataX"
					 }
					]
			 	}
			 },
		 "writer": {
				 "name": "streamwriter",
				 "parameter": {
				 "encoding": "UTF-8",
				 "print": true
			 	}
			 }
	 		}
	 	],
		 "setting": {
			 "speed": {
			 	"channel": 1
			 }
		 }
	 } 
 }

控制台执行命令

D:\softwares\datax\bin>python datax.py ../job/stream2stream.json

执行日志如下

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-12-22 10:35:44.537 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-12-22 10:35:44.545 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.221-b11
        jvmInfo:        Windows 10 amd64 10.0
        cpu num:        6

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size
        PS Eden Space                  | 256.00MB                       | 256.00MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        PS Survivor Space              | 42.50MB                        | 42.50MB
        PS Old Gen                     | 683.00MB                       | 683.00MB
        Metaspace                      | -0.00MB                        | 0.00MB


2022-12-22 10:35:44.562 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"streamreader",
                                "parameter":{
                                        "column":[
                                                {
                                                        "type":"long",
                                                        "value":"10"
                                                },
                                                {
                                                        "type":"string",
                                                        "value":"hello,DataX"
                                                }
                                        ],
                                        "sliceRecordCount":10
                                }
                        },
                        "writer":{
                                "name":"streamwriter",
                                "parameter":{
                                        "encoding":"UTF-8",
                                        "print":true
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":1
                }
        }
}

2022-12-22 10:35:44.581 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-12-22 10:35:44.584 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-12-22 10:35:44.585 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-12-22 10:35:44.586 [main] INFO  JobContainer - Set jobId = 0
2022-12-22 10:35:44.604 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2022-12-22 10:35:44.605 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2022-12-22 10:35:44.605 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2022-12-22 10:35:44.605 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-12-22 10:35:44.606 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2022-12-22 10:35:44.608 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2022-12-22 10:35:44.609 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2022-12-22 10:35:44.626 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-12-22 10:35:44.629 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-12-22 10:35:44.631 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-12-22 10:35:44.641 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-12-22 10:35:44.645 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-12-22 10:35:44.646 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-12-22 10:35:44.659 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
10      hello,DataX
2022-12-22 10:35:44.762 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[104]ms
2022-12-22 10:35:44.763 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-12-22 10:35:54.658 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 130 bytes | Speed 13B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-12-22 10:35:54.658 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2022-12-22 10:35:54.659 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2022-12-22 10:35:54.659 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2022-12-22 10:35:54.660 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-12-22 10:35:54.661 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\softwares\datax\hook
2022-12-22 10:35:54.662 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2022-12-22 10:35:54.662 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-12-22 10:35:54.662 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 130 bytes | Speed 13B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2022-12-22 10:35:54.663 [job-0] INFO  JobContainer -
任务启动时刻                    : 2022-12-22 10:35:44
任务结束时刻                    : 2022-12-22 10:35:54
任务总计耗时                    :                 10s
任务平均流量                    :               13B/s
记录写入速度                    :              1rec/s
读出记录总数                    :                  10
读写失败总数                    :                   0

(2) oracle->mysql

在D:\softwares\datax\job 文件夹下新建一个json文件
orcl2mysql.json,文件内容如下


{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "oraclereader",
                    "parameter": {
                        "column": [
                            "*"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:oracle:thin:@localhost:1521:orcl"
                                ],
                                "table": [
                                    "zx_mobile_temp"
                                ]
                            }
                        ],
                        "password": "xxx",
                        "username": "GL"
                    }
                },
                "writer": {
                    "name": "mysqlwriter",
                    "parameter": {
                        "column": [
                            "*"
                        ],
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://localhost:3306/test",
                                "table": [
                                    "zx_mobile_temp"
                                ]
                            }
                        ],
                        "password": "xxx",
                        "username": "root",
                        "writeMode": "insert"
                    }
                }
            }
        ],
        "setting": {
            "speed": {
                "channel": "1"
            }
        }
    }
}

控制台执行如下命令

D:\softwares\datax\bin>python datax.py ../job/orcl2mysql.json

执行日志如下

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-12-22 10:50:59.340 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-12-22 10:50:59.348 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.221-b11
        jvmInfo:        Windows 10 amd64 10.0
        cpu num:        6

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size              
        PS Eden Space                  | 256.00MB                       | 256.00MB               
        Code Cache                     | 240.00MB                       | 2.44MB                 
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                 
        PS Survivor Space              | 42.50MB                        | 42.50MB                
        PS Old Gen                     | 683.00MB                       | 683.00MB               
        Metaspace                      | -0.00MB                        | 0.00MB                 


2022-12-22 10:50:59.365 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"oraclereader",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:oracle:thin:@localhost:1521:orcl"
                                                        ],
                                                        "table":[
                                                                "zx_mobile_temp"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"GL"
                                }
                        },
                        "writer":{
                                "name":"mysqlwriter",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:mysql://localhost:3306/test",
                                                        "table":[
                                                                "zx_mobile_temp"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"root",
                                        "writeMode":"insert"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":"1"
                }
        }
}

2022-12-22 10:50:59.383 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-12-22 10:50:59.385 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-12-22 10:50:59.385 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-12-22 10:50:59.386 [main] INFO  JobContainer - Set jobId = 0
2022-12-22 10:51:04.642 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@localhost:1521:orcl.
2022-12-22 10:51:04.644 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会 运行出错。请检查您的配置并作出修改.
2022-12-22 10:51:04.856 [job-0] INFO  OriginalConfPretreatmentUtil - table:[zx_mobile_temp] all columns:[
mobile
].
2022-12-22 10:51:04.856 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2022-12-22 10:51:04.860 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (mobile) VALUES(?)
], which jdbcUrl like:[jdbc:mysql://localhost:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2022-12-22 10:51:04.861 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2022-12-22 10:51:04.861 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2022-12-22 10:51:04.861 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2022-12-22 10:51:04.862 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-12-22 10:51:04.862 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2022-12-22 10:51:04.866 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2022-12-22 10:51:04.866 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2022-12-22 10:51:04.879 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-12-22 10:51:04.882 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-12-22 10:51:04.884 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-12-22 10:51:04.891 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-12-22 10:51:04.895 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-12-22 10:51:04.895 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-12-22 10:51:04.906 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-12-22 10:51:04.911 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from zx_mobile_temp
] jdbcUrl:[jdbc:oracle:thin:@localhost:1521:orcl].
2022-12-22 10:51:04.971 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from zx_mobile_temp
] jdbcUrl:[jdbc:oracle:thin:@localhost:1521:orcl].
2022-12-22 10:51:05.227 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[322]ms
2022-12-22 10:51:05.228 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-12-22 10:51:14.904 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 66 bytes | Speed 6B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.044s | Percentage 100.00%
2022-12-22 10:51:14.905 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2022-12-22 10:51:14.905 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2022-12-22 10:51:14.905 [job-0] INFO  JobContainer - DataX Reader.Job [oraclereader] do post work.
2022-12-22 10:51:14.905 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-12-22 10:51:14.906 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\softwares\datax\hook
2022-12-22 10:51:14.907 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu    
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2022-12-22 10:51:14.907 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-12-22 10:51:14.908 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6 records, 66 bytes | Speed 6B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.044s | Percentage 100.00%
2022-12-22 10:51:14.909 [job-0] INFO  JobContainer -
任务启动时刻                    : 2022-12-22 10:50:59
任务结束时刻                    : 2022-12-22 10:51:14
任务总计耗时                    :                 15s
任务平均流量                    :                6B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   6
读写失败总数                    :                   0

(3)测试oracle 和 mysql上不同时间格式能否正确导入

oracle数据库建表

create table date_test(
    time1 date,
    time2 timestamp
);

并插入数据
在这里插入图片描述

mysql数据库建表

create table date_test1(
  time1 datetime,
  time2 timestamp
);

create table date_test2(
    time1 timestamp,
    time2 datetime
);

先让oracle date_test同步到 mysql date_test1
执行完毕后mysql date_test1表数据如下
在这里插入图片描述
因此
1、date->datetime √
2、timestamp->timestamp √

然后让oracle date_test同步到 mysql date_test2
执行完毕后mysql date_test1表数据如下
在这里插入图片描述
因此
1、date->timestamp √
2、timestamp->datetime√

(4)测试数据导入速度,mysql导入oracle

数据准备

create table student
(
    s_id    int  not null
        primary key auto_increment,
    s_name  varchar(16)  null,
    s_birth varchar(16)  null,
    s_age   varchar(2)   null,
    s_sex   char         null,
    s_img   varchar(128) null,
    create_time timestamp not null default now()
);

create function createName()
    RETURNS varchar(3)
BEGIN
    DECLARE LN VARCHAR(4000);
    DECLARE MN VARCHAR(4000);
    DECLARE FN VARCHAR(4000);
    DECLARE LN_N INT;
    DECLARE MN_N INT;
    DECLARE FN_N INT;
    SET LN = '赵钱孙李周吴郑王冯陈褚卫蒋沈韩杨朱秦尤许吕施张孔曹严华金魏陶姜戚谢邹喻柏水窦章云苏潘葛奚范彭郎鲁韦昌马苗凤花方俞任袁柳酆鲍史唐费廉岑薛雷贺倪汤滕殷罗毕郝邬安常乐于时傅皮卞齐康伍余元卜顾孟平黄和穆萧尹段';
    SET MN = '的一是在不了有和人这中大为上个国我以要他时来用们生到作地于出就分对成会可主发年动同工也能下过子说产种面而方后多定行学法所民得经十三之进着等部度家电力里如水化高自二理起小物现实加量都两体制机当使点从业本去把性好应开它合还因由其些然前外天政四日那社义事平形相全表间样与关各重新线内数正心反你明看原又么利比或但质气第向道命此变条只没结解问意建月公无系军很情者最立代想已通并提直题党程展五果料象员革位入常文总次品式活设及管特件长求老头基资边流路级少图山统接知较长将组见计别她手角期根论运农指几九区强放决西被干做必战先回则任取据处队南给色光门即保治北造百规热领七海地口东导器压志世金增争济阶油思术极交受联什认六共权收证改清已美再采转更单风切打白教速花带安场身车例真务具万每目至达走积示议声报斗完类八离华名确才科张信马节话米整空元况今集温传土许步群广石记需段研界拉林律叫且究观越织装影算低持音众书布复容儿须际商非验连断深难近矿千周委素技备半办青省列习响约支般史感劳便团往酸历市克何除消构府称太准精值号率族维划选标写存候毛亲快效斯院查江型眼王按格养易置派层片始却专状育厂京识适属圆包火住调满县局照参红细引听该铁价严首底液官德调随病苏失尔死讲配女黄推显谈罪神艺呢席含企望密批营项防举球英氧势告李台落木帮轮破亚师围注远字材排供河态封另施减树溶怎止案言士均武固叶鱼波视仅费紧爱左章早朝害续轻服试食充兵源判护司足某练差致板田降黑犯负击范继兴似余坚曲输修的故城夫够送笔船占右财吃富春职觉汉画功巴跟虽杂飞检吸助升阳互初创抗考投坏策古径换未跑留钢曾端责站简述钱副尽帝射草冲承独令限阿宣环双请超微让控州良轴找否纪益依优顶础载倒房突坐粉敌略客袁冷胜绝析块剂测丝协重诉念陈仍罗盐友洋错苦夜刑移频逐靠混母短皮终聚汽村云哪既距卫停烈央察烧行迅境若印洲刻括激孔搞甚室待核校散侵吧甲游久菜味旧模湖货损预阻毫普稳乙妈植息扩银语挥酒守拿序纸医缺雨吗针刘啊急唱误训愿审附获茶鲜粮斤孩脱硫肥善龙演父渐血欢械掌歌沙著刚攻谓盾讨晚粒乱燃矛乎杀药宁鲁贵钟煤读班伯香介迫句丰培握兰担弦蛋沉假穿执答乐谁顺烟缩征脸喜松脚困异免背星福买染井概慢怕磁倍祖皇促静补评翻肉践尼衣宽扬棉希伤操垂秋宜氢套笔督振架亮末宪庆编牛触映雷销诗座居抓裂胞呼娘景威绿晶厚盟衡鸡孙延危胶还屋乡临陆顾掉呀灯岁措束耐剧玉赵跳哥季课凯胡额款绍卷齐伟蒸殖永宗苗川炉岩弱零杨奏沿露杆探滑镇饭浓航怀赶库夺伊灵税了途灭赛归召鼓播盘裁险康唯录菌纯借糖盖横符私努堂域枪润幅哈竟熟虫泽脑壤碳欧遍侧寨敢彻虑斜薄庭都纳弹饲伸折麦湿暗荷瓦塞床筑恶户访塔奇透梁刀旋迹卡氯遇份毒泥退洗摆灰彩卖耗夏择忙铜献硬予繁圈雪函亦抽篇阵阴丁尺追堆雄迎泛爸楼避谋吨野猪旗累偏典馆索秦脂潮爷豆忽托惊塑遗愈朱替纤粗倾尚痛楚谢奋购磨君池旁碎骨监捕弟暴割贯殊释词亡壁顿宝午尘闻揭炮残冬桥妇警综招吴付浮遭徐您摇谷赞箱隔订男吹乐园纷唐败宋玻巨耕坦荣闭湾键凡驻锅救恩剥凝碱齿截炼麻纺禁废盛版缓净睛昌婚涉筒嘴插岸朗庄街藏姑贸腐奴啦惯乘伙恢匀纱扎辩耳彪臣亿璃抵脉秀萨俄网舞店喷纵寸汗挂洪着贺闪柬爆烯津稻墙软勇像滚厘蒙芳肯坡柱荡腿仪旅尾轧冰贡登黎削钻勒逃障氨郭峰币港伏轨亩毕擦莫刺浪秘援株健售股岛甘泡睡童铸汤阀休汇舍牧绕炸哲磷绩朋淡尖启陷柴呈徒颜泪稍忘泵蓝拖洞授镜辛壮锋贫虚弯摩泰幼廷尊窗纲弄隶疑氏宫姐震瑞怪尤琴循描膜违夹腰缘珠穷森枝竹沟催绳忆邦剩幸浆栏拥牙贮礼滤钠纹弹罢拍咱喊袖埃勤罚焦潜伍墨欲缝姓刊饱仿奖铝鬼丽跨默挖链扫喝袋炭污幕诸弧励梅奶洁灾舟鉴苯讼抱毁率懂寒智埔寄届跃渡挑丹艰贝碰拔爹戴码梦芽熔赤渔哭敬颗奔藏铅熟仲虎稀妹乏珍申桌遵允隆螺仓魏锐晓氮兼隐碍赫拨忠肃缸牵抢博巧壳兄杜讯诚碧祥柯页巡矩悲灌龄伦票寻桂铺圣恐恰郑趣抬荒腾贴柔滴猛阔辆妻填撤储签闹扰紫砂递戏吊陶伐喂疗瓶婆抚臂摸忍虾蜡邻胸巩挤偶弃槽劲乳邓吉仁烂砖租乌舰伴瓜浅丙暂燥橡柳迷暖牌纤秧胆详簧踏瓷谱呆宾糊洛辉愤竞隙怒粘乃绪肩籍敏涂熙皆侦悬掘享纠醒狂锁淀恨牲霸爬赏逆玩陵祝秒浙貌役彼悉鸭着趋凤晨畜辈秩卵署梯炎滩棋驱筛峡冒啥寿译浸泉帽迟硅疆贷漏稿冠嫩胁芯牢叛蚀奥鸣岭羊凭串塘绘酵融盆锡庙筹冻辅摄袭筋拒僚旱钾鸟漆沈眉疏添棒穗硝韩逼扭侨凉挺碗栽炒杯患馏劝豪辽勃鸿旦吏拜狗埋辊掩饮搬骂辞勾扣估蒋绒雾丈朵姆拟宇辑陕雕偿蓄崇剪倡厅咬驶薯刷斥番赋奉佛浇漫曼扇钙桃扶仔返俗亏腔鞋棱覆框悄叔撞骗勘旺沸孤粘吐孟渠屈疾妙惜仰狠胀谐抛霉桑岗嘛衰盗渗脏赖涌甜曹阅肌哩厉烃纬毅昨伪症煮叹钉搭茎笼酷偷弓锥恒杰坑鼻翼纶叙狱逮罐络棚抑膨蔬寺骤穆冶枯册尸凸绅坯牺焰轰欣晋瘦御锭锦丧旬锻垄搜佛扑邀亭酯迈舒脆酶闲忧酚顽羽涨卸仗陪薄辟惩杭姚肚捉飘漂昆欺吾郎烷汁呵饰萧雅邮迁燕撒姻赴宴烦削债帐斑铃旨醇董饼雏姿拌傅腹妥揉贤拆歪葡胺丢浩徽昂垫挡览贪慰缴汪慌冯诺姜谊凶劣诬耀昏躺盈骑乔溪丛卢抹易闷咨刮驾缆悟摘铒掷颇幻柄惠惨佳仇腊窝涤剑瞧堡泼葱罩霍捞胎苍滨俩捅湘砍霞邵萄疯淮遂熊粪烘宿档戈驳嫂裕徙箭捐肠撑晒辨殿莲摊搅酱屏疫哀蔡堵沫皱畅叠阁莱敲辖钩痕坝巷饿祸丘玄溜曰逻彭尝卿妨艇吞韦怨矮歇郊禄捻漠粹颠宏冤肪饥呵仙押挨醛娃拾没佩勿吓讹侯恋夕锌篡戚淋蓬岂釉兆泊魂拘亡杠摧氟颂浑凌铀诱犁谴颁舶扯嘉萌犹滋焊舌匹媳肺掠酿烹疲驰鸦窄辱狭朴遣菲奸韧辣拳秆卧醉竭茅墓矣哎艳敦舆缔雇尿葬履契禽渣衬躲赔咸溉贼醋堤抖妃裤廉晴挽掀茫丑亥拦悠阐慧佐奇竖孝柜麟绣遥逝愁肖昭芬逢窑捷圜盲闸宙辐披账狼幽绸蜂慎餐酬誓惟叉弥址帜芝砌唉仆涛臭翠盒劫慨炳阖寂椒倘拓畏喉巾颈垦拚兽蔽芦乾爽窃谭挣崩模褐传翅儒伞晃谬胚剖凑眠浊霜礁蔑抄闯洒碑蓉耶猜蹲壶唤澳锯郡玲绵纽梳掏吁锤鼠穴椅殷遮吵萍厌畜俱夸吕囊捧雌闽饶瞬郁哨凿朝俺浒茂肝勋盯籽耻菊滥稼戒奈帅鞭蚕镁询跌烤坛宅笛鄂蛮颤棍睁鼎岌降侍藩嚷匪岳糟缠迪泄卑氛堪萝盛碘缚悦澄甫攀屠溢拱晰携朽吟菱谦凹俊芒盼婶艘酰趁唇挫羞浴疼萎肴愚肿刨绞枢嫁慕舱铲苹豫谕迭潘顷翁榜匠欠茬畴胃沾踪弊哼鹏歧桐沃悼惑溃蔗荐潭孢露诊庸聪嫌厨庞祁钳肆梭赠崖篮颖甸藻捣且撕诏贞赐慈炕胖兹差琼锈汛卓棵馈挠灶婴蒂肤衫沥仑勉沪逸蜜浦嗓晕膏祭赢艾扮鹅怜蒲兔孕呖蘖挪淑谣惧廊缅俘骄膀陡宰诞峻恼腺猎涡夷愉魔铵葛贾似荫哟脊钞苛锰椭镶杏溴倚滞会氓捏斩傲匆僵卤烫衍榨拢裸屑咽坊舅渴翔邪拄窖猫砌钦媒脾勺柏栅噪昼耿扁辰秤得贩糕梁昙衷宦扔哇诈嘱藤卜冈悔廓皂拐氰杉玛矢寓瓣罕垮笋淘衔称恭喇帕桉秉帘铭蛇摔斋叭帆裸俭瘤篷砸肢辟脖瞪暑卜竿歼笙酮蕴哗瞎喀刃楔喘枚嵌挝厢粤甩拴膝恳腕娓熄锚忌愧哦荆圃骚丸蒜毯弗俯鹿梢屯衙轿贱垒谅踢哑滔渥饷泳棕熬搁腈梨吻樱奠捆姨柏聘惕郓绑冀裹酥寡彦稠啡钝汝擅汰鳙埔敞嘿逊栋谨咖鲤雀佣庵葫贿鳞拼搏谎塌忉腻戊怖坟禾刹嘻桔坎拇煽狮痒曾梗寇鹰烛哄莽雯胳龟亟糠泌坪傻什喻渊蚌跪巷涅钊譬蕊膛侮奕枕辫况扼郝寥凄厦腥钧耦蹄戥屁诵匈桩钓涵倦袍抒屿蹈忿敷虹聊嗣尉灿糙蹬嗯姬狡笨辜僧茨讽翰枉岐枣崭焚咕猴揽涝耍趟汹咋傍镀给爵虏劈璋踩瞅迄昔汞呱诡魄祺嘲惶赃癌咐歉扳鄙庐聂便芡躯贬煌拧隋襄淤宠炊滇謇懒栓佑憾骆裙猖兜孵痼盥曝泣絮韵眷旷噢参栖盏鳌溅煎校榴暮琪淆陛巢哒吼槐唧其沛乞蜀蜇赚捍铰幂尧咒耽叮褂焕煞雹搓釜铬拣募淹瑰鲢茄灼邹躬觉娇焉彰鹤琳沦畔惹庶毙皖邢禹渍绷窜翘淫箪陌膊鞑咳玫巫拂蕉澜赎绥锄囱赌颊缕寅躁稚庚苟氦魁珊蜕蛭酌逗闺蔓撇豌朕缉襟镍桅荧侄卒佃瞿娶饪耸乍靶痴靖扛筐韶嚣崔蓿岔氘娥剿霖喃搪雍裳撰豹骏慷';
    SET FN = '的一是在不了有和人这中大为上个国我以要他时来用们生到作地于出就分对成会可主发年动同工也能下过子说产种面而方后多定行学法所民得经十三之进着等部度家电力里如水化高自二理起小物现实加量都两体制机当使点从业本去把性好应开它合还因由其些然前外天政四日那社义事平形相全表间样与关各重新线内数正心反你明看原又么利比或但质气第向道命此变条只没结解问意建月公无系军很情者最立代想已通并提直题党程展五果料象员革位入常文总次品式活设及管特件长求老头基资边流路级少图山统接知较长将组见计别她手角期根论运农指几九区强放决西被干做必战先回则任取据处队南给色光门即保治北造百规热领七海地口东导器压志世金增争济阶油思术极交受联什认六共权收证改清已美再采转更单风切打白教速花带安场身车例真务具万每目至达走积示议声报斗完类八离华名确才科张信马节话米整空元况今集温传土许步群广石记需段研界拉林律叫且究观越织装影算低持音众书布复容儿须际商非验连断深难近矿千周委素技备半办青省列习响约支般史感劳便团往酸历市克何除消构府称太准精值号率族维划选标写存候毛亲快效斯院查江型眼王按格养易置派层片始却专状育厂京识适属圆包火住调满县局照参红细引听该铁价严首底液官德调随病苏失尔死讲配女黄推显谈罪神艺呢席含企望密批营项防举球英氧势告李台落木帮轮破亚师围注远字材排供河态封另施减树溶怎止案言士均武固叶鱼波视仅费紧爱左章早朝害续轻服试食充兵源判护司足某练差致板田降黑犯负击范继兴似余坚曲输修的故城夫够送笔船占右财吃富春职觉汉画功巴跟虽杂飞检吸助升阳互初创抗考投坏策古径换未跑留钢曾端责站简述钱副尽帝射草冲承独令限阿宣环双请超微让控州良轴找否纪益依优顶础载倒房突坐粉敌略客袁冷胜绝析块剂测丝协重诉念陈仍罗盐友洋错苦夜刑移频逐靠混母短皮终聚汽村云哪既距卫停烈央察烧行迅境若印洲刻括激孔搞甚室待核校散侵吧甲游久菜味旧模湖货损预阻毫普稳乙妈植息扩银语挥酒守拿序纸医缺雨吗针刘啊急唱误训愿审附获茶鲜粮斤孩脱硫肥善龙演父渐血欢械掌歌沙著刚攻谓盾讨晚粒乱燃矛乎杀药宁鲁贵钟煤读班伯香介迫句丰培握兰担弦蛋沉假穿执答乐谁顺烟缩征脸喜松脚困异免背星福买染井概慢怕磁倍祖皇促静补评翻肉践尼衣宽扬棉希伤操垂秋宜氢套笔督振架亮末宪庆编牛触映雷销诗座居抓裂胞呼娘景威绿晶厚盟衡鸡孙延危胶还屋乡临陆顾掉呀灯岁措束耐剧玉赵跳哥季课凯胡额款绍卷齐伟蒸殖永宗苗川炉岩弱零杨奏沿露杆探滑镇饭浓航怀赶库夺伊灵税了途灭赛归召鼓播盘裁险康唯录菌纯借糖盖横符私努堂域枪润幅哈竟熟虫泽脑壤碳欧遍侧寨敢彻虑斜薄庭都纳弹饲伸折麦湿暗荷瓦塞床筑恶户访塔奇透梁刀旋迹卡氯遇份毒泥退洗摆灰彩卖耗夏择忙铜献硬予繁圈雪函亦抽篇阵阴丁尺追堆雄迎泛爸楼避谋吨野猪旗累偏典馆索秦脂潮爷豆忽托惊塑遗愈朱替纤粗倾尚痛楚谢奋购磨君池旁碎骨监捕弟暴割贯殊释词亡壁顿宝午尘闻揭炮残冬桥妇警综招吴付浮遭徐您摇谷赞箱隔订男吹乐园纷唐败宋玻巨耕坦荣闭湾键凡驻锅救恩剥凝碱齿截炼麻纺禁废盛版缓净睛昌婚涉筒嘴插岸朗庄街藏姑贸腐奴啦惯乘伙恢匀纱扎辩耳彪臣亿璃抵脉秀萨俄网舞店喷纵寸汗挂洪着贺闪柬爆烯津稻墙软勇像滚厘蒙芳肯坡柱荡腿仪旅尾轧冰贡登黎削钻勒逃障氨郭峰币港伏轨亩毕擦莫刺浪秘援株健售股岛甘泡睡童铸汤阀休汇舍牧绕炸哲磷绩朋淡尖启陷柴呈徒颜泪稍忘泵蓝拖洞授镜辛壮锋贫虚弯摩泰幼廷尊窗纲弄隶疑氏宫姐震瑞怪尤琴循描膜违夹腰缘珠穷森枝竹沟催绳忆邦剩幸浆栏拥牙贮礼滤钠纹弹罢拍咱喊袖埃勤罚焦潜伍墨欲缝姓刊饱仿奖铝鬼丽跨默挖链扫喝袋炭污幕诸弧励梅奶洁灾舟鉴苯讼抱毁率懂寒智埔寄届跃渡挑丹艰贝碰拔爹戴码梦芽熔赤渔哭敬颗奔藏铅熟仲虎稀妹乏珍申桌遵允隆螺仓魏锐晓氮兼隐碍赫拨忠肃缸牵抢博巧壳兄杜讯诚碧祥柯页巡矩悲灌龄伦票寻桂铺圣恐恰郑趣抬荒腾贴柔滴猛阔辆妻填撤储签闹扰紫砂递戏吊陶伐喂疗瓶婆抚臂摸忍虾蜡邻胸巩挤偶弃槽劲乳邓吉仁烂砖租乌舰伴瓜浅丙暂燥橡柳迷暖牌纤秧胆详簧踏瓷谱呆宾糊洛辉愤竞隙怒粘乃绪肩籍敏涂熙皆侦悬掘享纠醒狂锁淀恨牲霸爬赏逆玩陵祝秒浙貌役彼悉鸭着趋凤晨畜辈秩卵署梯炎滩棋驱筛峡冒啥寿译浸泉帽迟硅疆贷漏稿冠嫩胁芯牢叛蚀奥鸣岭羊凭串塘绘酵融盆锡庙筹冻辅摄袭筋拒僚旱钾鸟漆沈眉疏添棒穗硝韩逼扭侨凉挺碗栽炒杯患馏劝豪辽勃鸿旦吏拜狗埋辊掩饮搬骂辞勾扣估蒋绒雾丈朵姆拟宇辑陕雕偿蓄崇剪倡厅咬驶薯刷斥番赋奉佛浇漫曼扇钙桃扶仔返俗亏腔鞋棱覆框悄叔撞骗勘旺沸孤粘吐孟渠屈疾妙惜仰狠胀谐抛霉桑岗嘛衰盗渗脏赖涌甜曹阅肌哩厉烃纬毅昨伪症煮叹钉搭茎笼酷偷弓锥恒杰坑鼻翼纶叙狱逮罐络棚抑膨蔬寺骤穆冶枯册尸凸绅坯牺焰轰欣晋瘦御锭锦丧旬锻垄搜佛扑邀亭酯迈舒脆酶闲忧酚顽羽涨卸仗陪薄辟惩杭姚肚捉飘漂昆欺吾郎烷汁呵饰萧雅邮迁燕撒姻赴宴烦削债帐斑铃旨醇董饼雏姿拌傅腹妥揉贤拆歪葡胺丢浩徽昂垫挡览贪慰缴汪慌冯诺姜谊凶劣诬耀昏躺盈骑乔溪丛卢抹易闷咨刮驾缆悟摘铒掷颇幻柄惠惨佳仇腊窝涤剑瞧堡泼葱罩霍捞胎苍滨俩捅湘砍霞邵萄疯淮遂熊粪烘宿档戈驳嫂裕徙箭捐肠撑晒辨殿莲摊搅酱屏疫哀蔡堵沫皱畅叠阁莱敲辖钩痕坝巷饿祸丘玄溜曰逻彭尝卿妨艇吞韦怨矮歇郊禄捻漠粹颠宏冤肪饥呵仙押挨醛娃拾没佩勿吓讹侯恋夕锌篡戚淋蓬岂釉兆泊魂拘亡杠摧氟颂浑凌铀诱犁谴颁舶扯嘉萌犹滋焊舌匹媳肺掠酿烹疲驰鸦窄辱狭朴遣菲奸韧辣拳秆卧醉竭茅墓矣哎艳敦舆缔雇尿葬履契禽渣衬躲赔咸溉贼醋堤抖妃裤廉晴挽掀茫丑亥拦悠阐慧佐奇竖孝柜麟绣遥逝愁肖昭芬逢窑捷圜盲闸宙辐披账狼幽绸蜂慎餐酬誓惟叉弥址帜芝砌唉仆涛臭翠盒劫慨炳阖寂椒倘拓畏喉巾颈垦拚兽蔽芦乾爽窃谭挣崩模褐传翅儒伞晃谬胚剖凑眠浊霜礁蔑抄闯洒碑蓉耶猜蹲壶唤澳锯郡玲绵纽梳掏吁锤鼠穴椅殷遮吵萍厌畜俱夸吕囊捧雌闽饶瞬郁哨凿朝俺浒茂肝勋盯籽耻菊滥稼戒奈帅鞭蚕镁询跌烤坛宅笛鄂蛮颤棍睁鼎岌降侍藩嚷匪岳糟缠迪泄卑氛堪萝盛碘缚悦澄甫攀屠溢拱晰携朽吟菱谦凹俊芒盼婶艘酰趁唇挫羞浴疼萎肴愚肿刨绞枢嫁慕舱铲苹豫谕迭潘顷翁榜匠欠茬畴胃沾踪弊哼鹏歧桐沃悼惑溃蔗荐潭孢露诊庸聪嫌厨庞祁钳肆梭赠崖篮颖甸藻捣且撕诏贞赐慈炕胖兹差琼锈汛卓棵馈挠灶婴蒂肤衫沥仑勉沪逸蜜浦嗓晕膏祭赢艾扮鹅怜蒲兔孕呖蘖挪淑谣惧廊缅俘骄膀陡宰诞峻恼腺猎涡夷愉魔铵葛贾似荫哟脊钞苛锰椭镶杏溴倚滞会氓捏斩傲匆僵卤烫衍榨拢裸屑咽坊舅渴翔邪拄窖猫砌钦媒脾勺柏栅噪昼耿扁辰秤得贩糕梁昙衷宦扔哇诈嘱藤卜冈悔廓皂拐氰杉玛矢寓瓣罕垮笋淘衔称恭喇帕桉秉帘铭蛇摔斋叭帆裸俭瘤篷砸肢辟脖瞪暑卜竿歼笙酮蕴哗瞎喀刃楔喘枚嵌挝厢粤甩拴膝恳腕娓熄锚忌愧哦荆圃骚丸蒜毯弗俯鹿梢屯衙轿贱垒谅踢哑滔渥饷泳棕熬搁腈梨吻樱奠捆姨柏聘惕郓绑冀裹酥寡彦稠啡钝汝擅汰鳙埔敞嘿逊栋谨咖鲤雀佣庵葫贿鳞拼搏谎塌忉腻戊怖坟禾刹嘻桔坎拇煽狮痒曾梗寇鹰烛哄莽雯胳龟亟糠泌坪傻什喻渊蚌跪巷涅钊譬蕊膛侮奕枕辫况扼郝寥凄厦腥钧耦蹄戥屁诵匈桩钓涵倦袍抒屿蹈忿敷虹聊嗣尉灿糙蹬嗯姬狡笨辜僧茨讽翰枉岐枣崭焚咕猴揽涝耍趟汹咋傍镀给爵虏劈璋踩瞅迄昔汞呱诡魄祺嘲惶赃癌咐歉扳鄙庐聂便芡躯贬煌拧隋襄淤宠炊滇謇懒栓佑憾骆裙猖兜孵痼盥曝泣絮韵眷旷噢参栖盏鳌溅煎校榴暮琪淆陛巢哒吼槐唧其沛乞蜀蜇赚捍铰幂尧咒耽叮褂焕煞雹搓釜铬拣募淹瑰鲢茄灼邹躬觉娇焉彰鹤琳沦畔惹庶毙皖邢禹渍绷窜翘淫箪陌膊鞑咳玫巫拂蕉澜赎绥锄囱赌颊缕寅躁稚庚苟氦魁珊蜕蛭酌逗闺蔓撇豌朕缉襟镍桅荧侄卒佃瞿娶饪耸乍靶痴靖扛筐韶嚣崔蓿岔氘娥剿霖喃搪雍裳撰豹骏慷';
    SET LN_N = CHAR_LENGTH(LN);
    SET MN_N = CHAR_LENGTH(MN);
    SET FN_N = CHAR_LENGTH(FN);
    RETURN CONCAT(substring(LN,CEIL(RAND()*LN_N),1),substring(MN,CEIL(RAND()*LN_N),1),substring(FN,CEIL(RAND()*FN_N),1));

END;

create procedure insertStudent(in sn int)
begin
    declare i int;
    set i = 1;
    while i <= sn DO
            insert into
                student
            (s_name, s_birth, s_sex,s_age,s_img)
            values(
                      createName(),
                      concat('199',convert(ceil(rand()*9),char(1)),'-',
                             LPAD(convert(ceil(rand()*12),char(2)),2,'0'),
                             '-',
                             LPAD(convert(ceil(rand()*28),char(2)),2,'0')),
                      if(rand()*2>1,'男','女'),
                      ceil(rand()*28)+10,
                    'https://plc.jj20.com/up/allimg/1114/040221103339/210402103339-7.jpg'
                  );
            set i=i+1;
        end while;
end;

call insertStudent(10000000);

插入1000万条数据,耗时2 h 3 m 48 s 446 ms

oralce建表

create table student
(
    s_id        number
        primary key,
    s_name      varchar2(16)                         null,
    s_birth     varchar2(16)                         null,
    s_age       varchar2(2)                          null,
    s_sex       char(2)                              null,
    s_img       varchar2(128)                        null,
    create_time timestamp default sysdate not null
);

改脚本,同步数据

D:\softwares\datax\bin>python datax.py ../job/mysql2orcl.json

数据同步日志

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2022-12-22 14:21:23.243 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2022-12-22 14:21:23.259 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.221-b11
        jvmInfo:        Windows 10 amd64 10.0
        cpu num:        6

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [PS MarkSweep, PS Scavenge]

        MEMORY_NAME                    | allocation_size                | init_size              
        PS Eden Space                  | 256.00MB                       | 256.00MB               
        Code Cache                     | 240.00MB                       | 2.44MB                 
        Compressed Class Space         | 1,024.00MB                     | 0.00MB                 
        PS Survivor Space              | 42.50MB                        | 42.50MB                
        PS Old Gen                     | 683.00MB                       | 683.00MB               
        Metaspace                      | -0.00MB                        | 0.00MB                 


2022-12-22 14:21:23.274 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"mysqlreader",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:mysql://localhost:3306/test"
                                                        ],
                                                        "table":[
                                                                "student"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"root"
                                }
                        },
                        "writer":{
                                "name":"oraclewriter",
                                "parameter":{
                                        "column":[
                                                "*"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:oracle:thin:@localhost:1521:orcl",
                                                        "table":[
                                                                "student"
                                                        ]
                                                }
                                        ],
                                        "password":"******",
                                        "username":"GL"
                                }
                        }
                }
        ],
        "setting":{
                "speed":{
                        "channel":"1"
                }
        }
}

2022-12-22 14:21:23.274 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2022-12-22 14:21:23.274 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2022-12-22 14:21:23.290 [main] INFO  JobContainer - DataX jobContainer starts job.
2022-12-22 14:21:23.290 [main] INFO  JobContainer - Set jobId = 0
2022-12-22 14:21:23.524 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2022-12-22 14:21:23.524 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会 运行出错。请检查您的配置并作出修改.
2022-12-22 14:21:28.742 [job-0] INFO  OriginalConfPretreatmentUtil - table:[student] all columns:[
S_ID,S_NAME,S_BIRTH,S_AGE,S_SEX,S_IMG,CREATE_TIME
].
2022-12-22 14:21:28.742 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2022-12-22 14:21:28.742 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (S_ID,S_NAME,S_BIRTH,S_AGE,S_SEX,S_IMG,CREATE_TIME) VALUES(?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:oracle:thin:@localhost:1521:orcl]
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] do prepare work .
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2022-12-22 14:21:28.742 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] splits to [1] tasks.
2022-12-22 14:21:28.757 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2022-12-22 14:21:28.757 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2022-12-22 14:21:28.757 [job-0] INFO  JobContainer - Running by standalone Mode.
2022-12-22 14:21:28.773 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-12-22 14:21:28.773 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-12-22 14:21:28.773 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2022-12-22 14:21:28.773 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2022-12-22 14:21:28.773 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from student
] jdbcUrl:[jdbc:mysql://localhost:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2022-12-22 14:21:38.786 [job-0] INFO  StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 0.00%
2022-12-22 14:21:58.800 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1665728 records, 162130240 bytes | Speed 7.73MB/s, 83286 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 17.603s |  All Task WaitReaderTime 1.831s | Percentage 0.00%
2022-12-22 14:22:08.813 [job-0] INFO  StandAloneJobContainerCommunicator - Total 2433536 records, 237375424 bytes | Speed 7.18MB/s, 76780 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 26.292s |  All Task WaitReaderTime 2.600s | Percentage 0.00%
2022-12-22 14:22:18.826 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3090944 records, 301801408 bytes | Speed 6.14MB/s, 65740 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 34.475s |  All Task WaitReaderTime 3.254s | Percentage 0.00%
2022-12-22 14:22:28.840 [job-0] INFO  StandAloneJobContainerCommunicator - Total 3840512 records, 375259072 bytes | Speed 7.01MB/s, 74956 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 44.850s |  All Task WaitReaderTime 3.988s | Percentage 0.00%
2022-12-22 14:22:38.852 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4403712 records, 430452672 bytes | Speed 5.26MB/s, 56320 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 50.816s |  All Task WaitReaderTime 4.573s | Percentage 0.00%
2022-12-22 14:22:48.857 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4708864 records, 460357568 bytes | Speed 2.85MB/s, 30515 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 63.728s |  All Task WaitReaderTime 4.890s | Percentage 0.00%
2022-12-22 14:22:58.859 [job-0] INFO  StandAloneJobContainerCommunicator - Total 5302784 records, 518561728 bytes | Speed 5.55MB/s, 59392 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 72.980s |  All Task WaitReaderTime 5.471s | Percentage 0.00%
2022-12-22 14:23:08.874 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6167040 records, 603258816 bytes | Speed 8.08MB/s, 86425 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 81.839s |  All Task WaitReaderTime 6.311s | Percentage 0.00%
2022-12-22 14:23:18.878 [job-0] INFO  StandAloneJobContainerCommunicator - Total 6824448 records, 667684800 bytes | Speed 6.14MB/s, 65740 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 90.040s |  All Task WaitReaderTime 6.980s | Percentage 0.00%
2022-12-22 14:23:28.883 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7387648 records, 722878400 bytes | Speed 5.26MB/s, 56320 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 100.412s |  All Task WaitReaderTime 7.514s | Percentage 0.00%
2022-12-22 14:23:38.887 [job-0] INFO  StandAloneJobContainerCommunicator - Total 8137216 records, 796336064 bytes | Speed 7.01MB/s, 74956 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 109.425s |  All Task WaitReaderTime 8.224s | Percentage 0.00%
2022-12-22 14:23:48.892 [job-0] INFO  StandAloneJobContainerCommunicator - Total 8794624 records, 860762048 bytes | Speed 6.14MB/s, 65740 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 117.580s |  All Task WaitReaderTime 8.848s | Percentage 0.00%
2022-12-22 14:23:55.985 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from student
] jdbcUrl:[jdbc:mysql://localhost:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2022-12-22 14:23:56.251 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[147478]ms
2022-12-22 14:23:56.251 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2022-12-22 14:23:58.906 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10000000 records, 978888897 bytes | Speed 11.27MB/s, 120537 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 133.558s |  All Task WaitReaderTime 10.032s | Percentage 100.00%
2022-12-22 14:23:58.906 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2022-12-22 14:23:58.906 [job-0] INFO  JobContainer - DataX Writer.Job [oraclewriter] do post work.
2022-12-22 14:23:58.906 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2022-12-22 14:23:58.906 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2022-12-22 14:23:58.906 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\softwares\datax\hook
2022-12-22 14:23:58.906 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu    
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 PS MarkSweep         | 1                  | 1                  | 1                  | 0.022s             | 0.022s             | 0.022s
                 PS Scavenge          | 80                 | 80                 | 80                 | 0.153s             | 0.153s             | 0.153s

2022-12-22 14:23:58.906 [job-0] INFO  JobContainer - PerfTrace not enable!
2022-12-22 14:23:58.906 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10000000 records, 978888897 bytes | Speed 6.22MB/s, 66666 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 133.558s |  All Task WaitReaderTime 10.032s | Percentage 100.00%
2022-12-22 14:23:58.906 [job-0] INFO  JobContainer -
任务启动时刻                    : 2022-12-22 14:21:23
任务结束时刻                    : 2022-12-22 14:23:58
任务总计耗时                    :                155s
任务平均流量                    :            6.22MB/s
记录写入速度                    :          66666rec/s
读出记录总数                    :            10000000
读写失败总数                    :                   0

1000万条数据总共用时 155s,真的快

参考https://blog.youkuaiyun.com/prefect_start/article/details/126335662

<think>首先,用户的问题是基于之前的对话和提供的引用内容。用户查询:"我想查找DataX使用TextFile时先判断文件是否存在再执行的方法"。意思是用户想在DataX使用TextFile时,先检查文件是否存在,然后再执行任务。 我需要参考用户提供的引用内容: - 引用[1]: 提到DataX新版本支持所有强数据类型,无损传输。 - 引用[2]: 描述DataX打包后的目录结构。 - 引用[3]: 说明如何编辑配置文件并使用DataX执行job。 - 引用[4]: 提到DataX通过Python脚本启动Java进程,并提到定时执行多个job。 用户的问题焦点是:如何在使用TextFile时,先判断文件是否存在再执行。这可能涉及到DataX的配置文件或脚本层面的处理。 回顾我的系统级指令: - 所有行内数学表达式必须使用$...$格式。 - 独立公式使用$$...$$格式单独成段。 - LaTeX语法正确。 - 使用中文回答。 - 生成相关问题。 - 回答中引用的段落末尾自然地添加引用标识,如[^1]。 - 用户可见层:回答结构清晰,逐步解决问题,保证真实可靠。 此外,在示例中,有生成相关问题的部分:§§相关问题§§。 所以,我需要构建一个清晰的回答,包括: 1. 解释问题:回顾用户的问题。 2. 提供解决方案:基于引用内容,分享如何实现先判断文件存在再执行。 3. 引用相关内容:自然而然地添加引用标识。 4. 生成相关问题:在结尾添加相关问题。 现在,分析用户的问题: - DataX是一个数据同步工具,通常通过JSON配置文件定义任务。 - TextFile可能指的是读取或写入文本文件的插件,如txtfilereader或txtfilewriter。 - 用户想在执行前检查文件是否存在,可能是为了防止失败或错误。 在DataX的默认行为中,可能没有直接的内置选项来检查文件存在。通常,这需要在外部脚本中处理。 参考引用[3]和[4]: - 引用[3]:描述了如何编辑JSON配置文件并执行DataX。 - 引用[4]:提到DataX通过Python脚本启动Java进程,例如`python bin/datax.py job.json`。 所以,一个常见的方法是在调用DataX之前,使用外部脚本(如Shell脚本或Python脚本)来检查文件是否存在。如果文件不存在,则不执行DataX任务。 步骤建议: 1. 创建一个外部脚本(如Shell脚本)。 2. 在脚本中,使用命令检查文件是否存在。例如,在Linux中,使用`test -f file.txt`或`[ -f file.txt ]`。 3. 如果文件存在,则调用DataX执行任务;否则,输出错误或跳过。 4. 在DataX的job JSON文件中,指定TextFile相关的配置。 用户提到"使用TextFile",可能指的是使用TextFile插件。DataX有txtreader和txtwriter插件。 在我的回答中,应该: - 确认DataX本身可能不支持直接检查文件存在作为内置功能。 - 建议外部脚本方法。 - 提供一个示例脚本。 使用中文回答。 生成相关问题:例如,关于DataX的其他技巧、优化等。 确保回答真实可靠:基于我对DataX的了解(DataX是一个开源数据同步工具),参考用户提供的引用。 最终回答结构: - 引言:理解问题。 - 解决方案:步骤详细。 - 示例:提供脚本示例。 - 结尾:自然地添加引用标识。 - 生成相关问题。 </think>### DataX使用TextFile时先判断文件是否存在再执行的方法 用户的问题是关于在DataX使用TextFile插件(如txtreader或txtfilewriter)时,在执行任务前先判断文件是否存在的方法。这在数据同步中很重要,可以避免因文件缺失导致的任务失败(例如,在读取源文件或写入目标文件时出错)。DataX本身是一个数据同步框架,其核心功能集中在数据传输上,并没有内置的“文件存在检查”作为插件参数。因此,实现这一需求通常需要在外部脚本层处理,即在调用DataX任务前添加文件检查逻辑。 下面我将逐步解释解决方案,基于您提供的引用内容和DataX的最佳实践。解决方案主要依赖于外部脚本(如Shell或Python脚本)来控制任务执行流程,确保文件存在时才启动DataX作业。该方法安全可靠,已在实际生产环境中广泛应用[^3][^4]。 #### 步骤详解:如何实现文件存在检查再执行 1. **理解DataX执行流程**: DataX任务通过JSON配置文件定义,并使用Python脚本启动Java进程来执行。例如,引用[4]提到,DataX以`python bin/datax.py job.json`方式运行[^4]。文本文件插件(如txtfilereader或txtfilewriter)在JSON配置中指定文件路径(如`path`参数)。但插件本身不会自动检查文件存在性——如果文件不存在,任务会直接失败。因此,我们需要在调用DataX前添加自定义检查逻辑。 2. **创建外部检查脚本**: 建议使用Shell脚本(或Python脚本)在DataX执行前进行文件检查。Shell脚本在Linux环境下简单高效;如果您使用Windows,可改用PowerShell或Python脚本。以下是具体步骤: - **检查文件存在逻辑**:使用Linux的`test -f`命令或Python的`os.path.exists()`函数判断文件是否存在。 - **条件执行DataX**:如果文件存在,则调用DataX执行任务;否则,记录错误或跳过任务。 - **集成到DataX目录**:脚本应放置在DataX的`bin`或`script`目录下(引用[2]提到DataX目录结构包括`bin`、`conf`、`script`等[^2]),便于管理。 3. **提供示例脚本**: 以下是两个实用示例(Shell和Python脚本),您可以根据实际文件路径和DataX配置调整。 - **Shell脚本示例(推荐Linux环境)**: 保存为`check_and_run.sh`,并放在DataX的`script`目录下(例如 `/export/server/datax/script/check_and_run.sh`)。 ```bash #!/bin/bash # 定义文件路径和DataX任务配置文件 FILE_PATH="/path/to/your/textfile.txt" # 替换为您的文本文件路径 DATAX_JOB="/export/server/datax/job/mysql2hdfs.json" # 替换为您的DataX JSON配置文件路径,如引用[3]中的示例[^3] DATAX_HOME="/export/server/datax" # DataX安装目录,如引用[3]所示[^3] # 检查文件是否存在 if [ -f "$FILE_PATH" ]; then echo "文件存在,启动DataX任务..." # 切换到DataX目录并执行任务,如引用[3][^4] cd "$DATAX_HOME" python bin/datax.py "$DATAX_JOB" else echo "错误: 文件 $FILE_PATH 不存在,跳过DataX执行。" exit 1 # 可选:退出码非0表示错误 fi ``` 使用说明: - 赋予脚本执行权限:`chmod +x /export/server/datax/script/check_and_run.sh` - 运行脚本:`./script/check_and_run.sh` - 此脚本先检查`FILE_PATH`指定的文本文件是否存在,是则执行DataX任务(使用`python bin/datax.py`命令),否则打印错误。 - **Python脚本示例(跨平台兼容)**: 保存为`check_and_run.py`,放在DataX的`script`目录。Python脚本更灵活,适合Windows/Linux混合环境(引用[4]提到DataX依赖Python,无需额外安装[^4])。 ```python import os import subprocess # 定义文件路径和DataX配置 file_path = "/path/to/your/textfile.txt" # 替换为您的文本文件路径 datax_job = "/export/server/datax/job/mysql2hdfs.json" # DataX JSON配置路径,如引用[3] datax_home = "/export/server/datax" # DataX安装目录,如引用[2][^2] # 检查文件是否存在 if os.path.isfile(file_path): print("文件存在,启动DataX任务...") # 构建DataX命令并执行 command = f"python {datax_home}/bin/datax.py {datax_job}" subprocess.run(command, shell=True, cwd=datax_home) else: print(f"错误: 文件 {file_path} 不存在,跳过DataX执行。") # 可选:发送告警或记录日志 ``` 使用说明: - 运行脚本:`python /export/server/datax/script/check_and_run.py` - 此脚本使用Python内置模块检查文件存在性,并调用subprocess执行DataX命令。 4. **集成到定时任务或工作流**: 如果您需要定期执行(如cron job),可以将此脚本添加到系统定时任务中。例如,在Linux中使用crontab: ```bash # 编辑crontab:crontab -e # 每天凌晨1点检查文件并执行DataX 0 1 * * * /export/server/datax/script/check_and_run.sh ``` 引用[4]提到DataX支持定时执行多个job[^4],此方法可避免因文件缺失导致整个流程中断。 5. **注意事项**: - **文件路径处理**:确保脚本中的文件路径与DataX JSON配置一致。例如,在txtreader插件的JSON中,`path`参数指定了文件路径(如`"path": ["/data/input.txt"]`)。 - **错误处理**:脚本中添加日志记录(如输出到DataX的`log`目录),便于排查问题。 - **DataX版本兼容性**:该方法适用于DataX 3.0及以上版本(引用[1]确认新版本支持强数据类型[^1]),无需修改DataX插件代码。 - **性能影响**:文件检查操作轻量级,几乎不影响DataX任务性能。 通过外部脚本实现文件存在检查是一种标准做法,它解耦了数据同步逻辑和预处理步骤,保证任务可靠性[^3][^4]。如果您有具体的DataX JSON配置或环境细节,请提供更多信息,我可以优化示例脚本。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值