Windows 安装 DataX

本文详细介绍了在Windows环境下配置和使用DataX进行数据同步的过程,包括环境准备、DataX安装、JSON文件配置及运行示例,特别关注SQLServer数据库的数据迁移。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、安装

1.1 环境准备

环境版本
操作系统Win Server 2019
Python2.7.18
JDK1.8。0_261
DataX3.0

1.2 安装

Python 和 JDK的安装步骤省略。这里主要介绍下DataX的安装。

DataX下载地址:https://github.com/alibaba/DataX
下载后直接解压。

在这里插入图片描述

二、配置运行

2.1 JSON文件配置

在DataX的根目录下,有个job的文件夹,里面存放的是同步任务的json文件,具体各个类型的数据库需要加载的插件,可以到github说明文档中查看用法。这里以SQL Server举例(这里需要注意的是貌似DataX 不支持SQL Server 2000)。

{
	"job": {
        "setting": {
            "speed": {
                "byte":10485760
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
		"content": [{
			"reader": {
				"name": "sqlserverreader",
				"parameter": {
					"username": "sa",
					"password": "Zshu123456",
					"connection": [{
						"querySql": [
							"SELECT * FROM students"	
						],
						"jdbcUrl": [
							"jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000"
						]
					}],
					"maxRetries": 3
				}
			},
			"writer": {
				"name": "sqlserverwriter",
				"parameter": {
					"username": "sa",
					"password": "Zshu123456",
					"dateFormat": "YYYY-MM-dd hh:mm:ss",
					"column": [
						"student_id","student_name"
					],
					"preSql": [
						"DELETE FROM students2"
					],
					"connection": [{
						"jdbcUrl": "jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest",
						"table": [
							"students2"
						]
					}]
				}
			}
		}]
	}
}

2.2 运行

python datax.py ../job/sqlserver.json
C:\datax\bin>python datax.py ../job/sqlserver.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2020-08-26 16:02:38.748 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-08-26 16:02:38.765 [main] INFO  Engine - the machine info  =>

        osInfo: Oracle Corporation 1.8 25.261-b12
        jvmInfo:        Windows Server 2019 amd64 10.0
        cpu num:        1

        totalPhysicalMemory:    -0.00G
        freePhysicalMemory:     -0.00G
        maxFileDescriptorCount: -1
        currentOpenFileDescriptorCount: -1

        GC Names        [Copy, MarkSweepCompact]

        MEMORY_NAME                    | allocation_size                | init_size
        Eden Space                     | 273.06MB                       | 273.06MB
        Code Cache                     | 240.00MB                       | 2.44MB
        Survivor Space                 | 34.13MB                        | 34.13MB
        Compressed Class Space         | 1,024.00MB                     | 0.00MB
        Metaspace                      | -0.00MB                        | 0.00MB
        Tenured Gen                    | 682.69MB                       | 682.69MB


2020-08-26 16:02:38.815 [main] INFO  Engine -
{
        "content":[
                {
                        "reader":{
                                "name":"sqlserverreader",
                                "parameter":{
                                        "connection":[
                                                {
                                                        "jdbcUrl":[
                                                                "jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000"
                                                        ],
                                                        "querySql":[
                                                                "SELECT * FROM students"
                                                        ]
                                                }
                                        ],
                                        "maxRetries":3,
                                        "password":"**********",
                                        "username":"sa"
                                }
                        },
                        "writer":{
                                "name":"sqlserverwriter",
                                "parameter":{
                                        "column":[
                                                "student_id",
                                                "student_name"
                                        ],
                                        "connection":[
                                                {
                                                        "jdbcUrl":"jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest",
                                                        "table":[
                                                                "students2"
                                                        ]
                                                }
                                        ],
                                        "dateFormat":"YYYY-MM-dd hh:mm:ss",
                                        "password":"**********",
                                        "preSql":[
                                                "DELETE FROM students2"
                                        ],
                                        "username":"sa"
                                }
                        }
                }
        ],
        "setting":{
                "errorLimit":{
                        "percentage":0.02,
                        "record":0
                },
                "speed":{
                        "byte":10485760
                }
        }
}

2020-08-26 16:02:38.859 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-08-26 16:02:38.863 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-08-26 16:02:38.864 [main] INFO  JobContainer - DataX jobContainer starts job.
2020-08-26 16:02:38.874 [main] INFO  JobContainer - Set jobId = 0
2020-08-26 16:02:45.505 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000.
2020-08-26 16:02:46.565 [job-0] INFO  OriginalConfPretreatmentUtil - table:[students2] all columns:[
student_id,student_name
].
2020-08-26 16:02:46.617 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
INSERT INTO %s (student_id,student_name) VALUES(?,?)
], which jdbcUrl like:[jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest]
2020-08-26 16:02:46.620 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2020-08-26 16:02:46.621 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do prepare work .
2020-08-26 16:02:46.625 [job-0] INFO  JobContainer - DataX Writer.Job [sqlserverwriter] do prepare work .
2020-08-26 16:02:46.727 [job-0] INFO  CommonRdbmsWriter$Job - Begin to execute preSqls:[DELETE FROM students2]. context info:jdbc:sqlserver://192.168.1.111:1433;DatabaseName=MyTest.
2020-08-26 16:02:46.734 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2020-08-26 16:02:46.737 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2020-08-26 16:02:46.743 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] splits to [1] tasks.
2020-08-26 16:02:46.745 [job-0] INFO  JobContainer - DataX Writer.Job [sqlserverwriter] splits to [1] tasks.
2020-08-26 16:02:46.777 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2020-08-26 16:02:46.792 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2020-08-26 16:02:46.796 [job-0] INFO  JobContainer - Running by standalone Mode.
2020-08-26 16:02:46.842 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-08-26 16:02:46.849 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-08-26 16:02:46.850 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2020-08-26 16:02:46.896 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2020-08-26 16:02:47.561 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [SELECT * FROM students
] jdbcUrl:[jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000].
2020-08-26 16:02:47.783 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [SELECT * FROM students
] jdbcUrl:[jdbc:sqlserver://192.168.1.117:1433;DatabaseName=MyTest2000].
2020-08-26 16:02:48.564 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[1669]ms
2020-08-26 16:02:48.566 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-08-26 16:02:56.823 [job-0] INFO  StandAloneJobContainerCommunicator - Total 999 records, 32859 bytes | Speed 3.21KB/s, 99 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.008s |  All Task WaitReaderTime 0.603s | Percentage 100.00%
2020-08-26 16:02:57.609 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2020-08-26 16:02:57.614 [job-0] INFO  JobContainer - DataX Writer.Job [sqlserverwriter] do post work.
2020-08-26 16:02:57.618 [job-0] INFO  JobContainer - DataX Reader.Job [sqlserverreader] do post work.
2020-08-26 16:02:57.621 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2020-08-26 16:02:57.662 [job-0] INFO  JobContainer -
         [total cpu info] =>
                averageCpu                     | maxDeltaCpu                    | minDeltaCpu
                -1.00%                         | -1.00%                         | -1.00%


         [total gc info] =>
                 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
                 Copy                 | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
                 MarkSweepCompact     | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s

2020-08-26 16:02:57.681 [job-0] INFO  JobContainer - PerfTrace not enable!
2020-08-26 16:02:57.698 [job-0] INFO  StandAloneJobContainerCommunicator - Total 999 records, 32859 bytes | Speed 3.21KB/s, 99 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.008s |  All Task WaitReaderTime 0.603s | Percentage 100.00%
2020-08-26 16:02:57.753 [job-0] INFO  JobContainer -
任务启动时刻                    : 2020-08-26 16:02:38
任务结束时刻                    : 2020-08-26 16:02:57
任务总计耗时                    :                 18s
任务平均流量                    :            3.21KB/s
记录写入速度                    :             99rec/s
读出记录总数                    :                 999
读写失败总数                    :                   0

这里如果看到有乱码时,可以在CMD命令行中运行

CHCP 65001
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值