elasticsearch同步mysql数据库神器之go-mysql-elasticsearch

最新推荐文章于 2024-08-12 08:31:48 发布

原创最新推荐文章于 2024-08-12 08:31:48 发布 · 3.2k 阅读

9 ·

CC 4.0 BY-SA版权

数据库专栏收录该内容

11 篇文章

订阅专栏

本文介绍了一款由国内开发者编写的go-mysql-elasticsearch插件，它能够实现MySQL与Elasticsearch之间的同步增删改查操作。文章详细阐述了插件的安装步骤，包括安装Go、godep及插件本身，并提供了配置文件river.toml的修改指南，涵盖了MySQL和Elasticsearch的连接设置、数据同步规则等关键信息。

go-mysql-elasticsearch 是国内作者开发的一款插件。测试表明：该插件优点：能实现同步增、删、改、查操作。不足之处（待完善的地方）：
1、日志不是很详细，但是能满足基本需求；
2、初始化时，无法自动同步mysql中存在的以前的数据，需要自行解决初始导入（如重建索引批量导入）

go-mysql-elasticsearch 安装
步骤1：安装go
yum install go
步骤2：安装godep
go get github.com/tools/godep
步骤3：获取go-mysql-elastisearch插件
go get github.com/siddontang/go-mysql-elasticsearch
步骤4：安装go-mysql-elastisearch插件
cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
make
go-mysql-elasticsearch 使用
1 修改配置文件 vi river.toml

MySQL address, user and password

user must have replication privilege in MySQL.

#以下为同步的mysql配置
my_addr = “127.0.0.1:3306”
my_user = “root”
my_pass = “123456”
my_charset = “utf8”

Set true when elasticsearch use https

#es_https = false

Elasticsearch address

es_addr = “192.168.100.90:9200”

Elasticsearch user and password, maybe set by shield, nginx, or x-pack

es_user = “”
es_pass = “”

Path to store data, like master.info, if not set or empty,

we must use this to support breakpoint resume syncing.

TODO: support other storage, like etcd.

data_dir = “./var”

Inner Http status address

stat_addr = “127.0.0.1:12800”

pseudo server id like a slave

server_id = 1001

mysql or mariadb

flavor = “mysql”

mysqldump execution path

if not set or empty, ignore mysqldump.

#mysqldump = “mysqldump”

if we have no privilege to use mysqldump with --master-data,

we must skip it.

#skip_master_data = false

minimal items to be inserted in one bulk

bulk_size = 128

force flush the pending requests if we don’t have enough items >= bulk_size

flush_bulk_time = “200ms”

Ignore table without primary key

skip_no_pk_table = false

MySQL data source

[[source]]
schema = “zkbh_nbjd”

Only below tables will be synced into Elasticsearch.

“t_[0-9]{4}” is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023

I don’t think it is necessary to sync all tables in a database.

#同步的数据表列表，多个表用,隔开
tables = [“sys_user”,“sys_log”]

Below is for special rule mapping

Very simple example

desc t;

±------±-------------±-----±----±--------±------+

| Field | Type | Null | Key | Default | Extra |

±------±-------------±-----±----±--------±------+

| id | int(11) | NO | PRI | NULL | |

| name | varchar(256) | YES | | NULL | |

±------±-------------±-----±----±--------±------+

The table `t` will be synced to ES index `test` and type `t`

同步zkbh_nbjd数据库下的sys_user表数据到索引user中
[[rule]]
schema = “zkbh_nbjd”
table = “sys_user”
index = “user”
type = “novel”

Wildcard table rule, the wildcard table must be in source tables

All tables which match the wildcard format will be synced to ES index `test` and type `t`.

In this example, all tables must have same schema with above table `t`;

同步zkbh_nbjd数据库下的sys_log表数据到索引log中
[[rule]]
schema = “zkbh_nbjd”
table = “sys_log”
index = “log”
type = “log”

3.启动 go-mysql-elasticsearch

cd /root/go/src/github.com/siddontang/go-mysql-elasticsearch
nohup ./bin/go-mysql-elasticsearch -config=./etc/river.toml & 为后台启动，否则会因为登录linux的用户退出而关闭服务。此处需要引入Screen 窗口管理器来保证 go-mysql-elasticsearch服务不会关闭，具体请查看相关资料