datax常用的任务

最新推荐文章于 2024-06-09 23:57:58 发布

原创最新推荐文章于 2024-06-09 23:57:58 发布 · 1.3k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#mysql #数据库 #database

本文介绍了两个数据处理任务：1）从一个MySQL数据库到另一个MySQL数据库的数据同步更新操作；2）将MySQL数据高效地导入到Elasticsearch，用于全文检索和分析。配置文件详细展示了数据读取、转换和写入的参数设置，包括字段映射、错误限制和性能调优。

1、从mysql到mysql

{
  "job": {
    "setting": {
      "speed": {
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "123456",
            "column": [
              "`pid`",
              "`name`",
              "`sex`",
              "`education`"
            ],
            "splitPk": "",
            "connection": [
              {
                "table": [
                  "icr_name"
                ],

               // "querySql": [],

                "jdbcUrl": [
                  "jdbc:mysql://xxxx:3306/portal"
                ]
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "xxxx",
            "password": "xxxx",
            "writeMode": "update",
            "column": [
              "`id`",
              "`res_name`",
              "`gender`",
              "`education`"
            ],
            "connection": [
              {
                "table": [
                  "t_base_expert_info"
                ],
                "jdbcUrl": "jdbc:mysql://xxxx/etcc_natural_disaster"
              }
            ]
          }
        }
      }
    ]
  }
}

2、从mysql到es

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      },
      "errorLimit": {
        "percentage": 0
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "xxxx",
            "password": "xxxx",
            "connection": [
              {
                "querySql": [
                  "SELECT id,'n_base_hospital' AS baseResourceType,res_type,status,res_name AS resName,longitude,latitude,CONCAT_WS(',',latitude,longitude) AS location,created_at AS createdAt,updated_at AS updatedAt,deleted FROM t_base_hospital WHERE deleted=0 AND !ISNULL(longitude) AND LENGTH(longitude)>0"
                ],
                "jdbcUrl": [
                  "jdbc:mysql://xxxx:3306/etcc_natural_disaster"
                ]
              }
            ]
          }
        },
        "writer": {
          "name": "elasticsearchwriter",
          "parameter": {
            "endpoint": "http://xxxx:9200",
            "index": "n_base_hospital",
            "type": "_doc",
            "accessId": "",
            "accessKey": "",
            "cleanup": true,
            "dynamic": true,
            "settings": {
              "index": {
                "number_of_shards": 1,
                "number_of_replicas": 1
              }
            },
            "discovery": false,
            "batchSize": 1000,
            "splitter": ",",
            "column": [
              {
                "name": "id",
                "type": "keyword"
              },
              {
                "name": "baseResourceType",
                "type": "keyword"
              },
              {
                "name": "resType",
                "type": "keyword"
              },
              {
                "name": "status",
                "type": "integer"
              },
              {
                "name": "resName",
                "type": "text"
              },
              {
                "name": "longitude",
                "type": "keyword"
              },
              {
                "name": "latitude",
                "type": "keyword"
              },
              {
                "name": "location",
                "type": "geo_point"
              }
            ]
          }
        }
      }
    ]
  }
}

3、普通的定时任务（shell任务）
curl -i -X GET -H 'Content-type':'application/json' http://xxxx