一、安装
1.1 环境准备
环境 | 版本 |
---|---|
操作系统 | Win Server 2019 |
Python | 2.7.18 |
JDK | 1.8。0_261 |
DataX | 3.0 |
1.2 安装
Python 和 JDK的安装步骤省略。这里主要介绍下DataX的安装。
DataX下载地址:https://github.com/alibaba/DataX
下载后直接解压。
二、配置运行
2.1 JSON文件配置
在DataX的根目录下,有个job的文件夹,里面存放的是同步任务的json文件,具体各个类型的数据库需要加载的插件,可以到github说明文档中查看用法。这里以SQL Server举例(这里需要注意的是貌似DataX 不支持SQL Server 2000)。
{
"job": {
"setting": {
"speed": {
"byte":10485760
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "sqlserverreader",
"parameter": {
"username": "sa",
"password": "Zshu123456",
"connection": [{
"querySql": [
"SELECT * FROM students"
],
"jdbcUrl": [
"jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000"
]
}],
"maxRetries": 3
}
},
"writer": {
"name": "sqlserverwriter",
"parameter": {
"username": "sa",
"password": "Zshu123456",
"dateFormat": "YYYY-MM-dd hh:mm:ss",
"column": [
"student_id","student_name"
],
"preSql": [
"DELETE FROM students2"
],
"connection": [{
"jdbcUrl": "jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest",
"table": [
"students2"
]
}]
}
}
}]
}
}
2.2 运行
python datax.py ../job/sqlserver.json
C:\datax\bin>python datax.py ../job/sqlserver.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2020-08-26 16:02:38.748 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-08-26 16:02:38.765 [main] INFO Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.261-b12
jvmInfo: Windows Server 2019 amd64 10.0
cpu num: 1
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [Copy, MarkSweepCompact]
MEMORY_NAME | allocation_size | init_size
Eden Space | 273.06MB | 273.06MB
Code Cache | 240.00MB | 2.44MB
Survivor Space | 34.13MB | 34.13MB
Compressed Class Space | 1,024.00MB | 0.00MB
Metaspace | -0.00MB | 0.00MB
Tenured Gen | 682.69MB | 682.69MB
2020-08-26 16:02:38.815 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"sqlserverreader",
"parameter":{
"connection":[
{
"jdbcUrl":[
"jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000"
],
"querySql":[
"SELECT * FROM students"
]
}
],
"maxRetries":3,
"password":"**********",
"username":"sa"
}
},
"writer":{
"name":"sqlserverwriter",
"parameter":{
"column":[
"student_id",
"student_name"
],
"connection":[
{
"jdbcUrl":"jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest",
"table":[
"students2"
]
}
],
"dateFormat":"YYYY-MM-dd hh:mm:ss",
"password":"**********",
"preSql":[
"DELETE FROM students2"
],
"username":"sa"
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":0
},
"speed":{
"byte":10485760
}
}
}
2020-08-26 16:02:38.859 [main] WARN Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-08-26 16:02:38.863 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-08-26 16:02:38.864 [main] INFO JobContainer - DataX jobContainer starts job.
2020-08-26 16:02:38.874 [main] INFO JobContainer - Set jobId = 0
2020-08-26 16:02:45.505 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000.
2020-08-26 16:02:46.565 [job-0] INFO OriginalConfPretreatmentUtil - table:[students2] all columns:[
student_id,student_name
].
2020-08-26 16:02:46.617 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (student_id,student_name) VALUES(?,?)
], which jdbcUrl like:[jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest]
2020-08-26 16:02:46.620 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2020-08-26 16:02:46.621 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2020-08-26 16:02:46.625 [job-0] INFO JobContainer - DataX Writer.Job [sqlserverwriter] do prepare work .
2020-08-26 16:02:46.727 [job-0] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[DELETE FROM students2]. context info:jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest.
2020-08-26 16:02:46.734 [job-0] INFO JobContainer - jobContainer starts to do split ...
2020-08-26 16:02:46.737 [job-0] INFO JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2020-08-26 16:02:46.743 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2020-08-26 16:02:46.745 [job-0] INFO JobContainer - DataX Writer.Job [sqlserverwriter] splits to [1] tasks.
2020-08-26 16:02:46.777 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2020-08-26 16:02:46.792 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2020-08-26 16:02:46.796 [job-0] INFO JobContainer - Running by standalone Mode.
2020-08-26 16:02:46.842 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-08-26 16:02:46.849 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-08-26 16:02:46.850 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2020-08-26 16:02:46.896 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-08-26 16:02:47.561 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [SELECT * FROM students
] jdbcUrl:[jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000].
2020-08-26 16:02:47.783 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [SELECT * FROM students
] jdbcUrl:[jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000].
2020-08-26 16:02:48.564 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1669]ms
2020-08-26 16:02:48.566 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-08-26 16:02:56.823 [job-0] INFO StandAloneJobContainerCommunicator - Total 999 records, 32859 bytes | Speed 3.21KB/s, 99 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.008s | All Task WaitReaderTime 0.603s | Percentage 100.00%
2020-08-26 16:02:57.609 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2020-08-26 16:02:57.614 [job-0] INFO JobContainer - DataX Writer.Job [sqlserverwriter] do post work.
2020-08-26 16:02:57.618 [job-0] INFO JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2020-08-26 16:02:57.621 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2020-08-26 16:02:57.662 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
Copy | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
MarkSweepCompact | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
2020-08-26 16:02:57.681 [job-0] INFO JobContainer - PerfTrace not enable!
2020-08-26 16:02:57.698 [job-0] INFO StandAloneJobContainerCommunicator - Total 999 records, 32859 bytes | Speed 3.21KB/s, 99 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.008s | All Task WaitReaderTime 0.603s | Percentage 100.00%
2020-08-26 16:02:57.753 [job-0] INFO JobContainer -
任务启动时刻 : 2020-08-26 16:02:38
任务结束时刻 : 2020-08-26 16:02:57
任务总计耗时 : 18s
任务平均流量 : 3.21KB/s
记录写入速度 : 99rec/s
读出记录总数 : 999
读写失败总数 : 0
这里如果看到有乱码时,可以在CMD命令行中运行
CHCP 65001