前段时间,搞了下es的数据迁移,将自建的es的数据迁移到阿里云上,特此记录下迁移的过程,方案的确定、具体实施、和最后的数据验证。
- 阿里云es索引创建、删除、查询、插入、分词、扩展词热更新等测试,以及性能测试是否符合业务需求
- 确定方案:因为需要原es服务不能停,一直有数据插入及查询。最终确定以全量同步+增量同步+切换域名的方案,进行不停机无缝迁移
- 中间件:使用logstash(可以使用自建或者直接买阿里云的),修改配置,编辑input和output。阿里logstash介绍
见官方文档地址: https://help.aliyun.com/document_detail/139467.html
1.给索引增加时间戳字段,为了后期的增量同步
因为通过全量+增量方式,需要获取增量数据,需要索引中有时间戳字段,如果没有时间戳字段,可给索引增加时间戳字段:
(1)给原ES索引增加时间戳字段
此步骤执行一次即可
PUT _ingest/pipeline/my_timestamp_pipeline
{
"description": "Adds a field to a document with the time of ingestion",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
(2)给索引增加设置时间戳字段
PUT /index1/_settings
{
"settings": {
"default_pipeline": "my_timestamp_pipeline"
}
}
(3)更新字段内容,使历史数据时间戳有值
POST index1/_doc/_update_by_query
{
"script":{
"lang":"painless",
"source":"if (ctx._source.ingest_timestamp == null) {ctx._source.ingest_timestamp = 'now'}"
}
}
(4)测试查询索引数据是否包含时间戳字段ingest_timestamp
GET /index1/_search
{
"query": {
"range": {
"ingest_timestamp": {
"gte":"2023-11-28 14:40:00",
"time_zone":"+08:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
}
2.全量迁移,logstash脚本参考如下:
input {
elasticsearch {
hosts => ["http://xxxxx:9200"]
user => "elastic"
index => "index1,index2,index3"
password => "******"
docinfo => true
}
}
filter {
}
output {
elasticsearch {
hosts => ["http://es-cn-xxxx.elasticsearch.aliyuncs.com:9200"]
user => "elastic"
password => "******"
index => "%{[@metadata][_index]}"
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
}
file_extend {
path => "/ssd/1/ls/logstash/logs/debug/prod"
}
}
3.执行完全量后,执行增量同步,增量同步脚本如下:
input {
elasticsearch {
hosts => ["http://******:9200"]
user => "elastic"
password => "xxxxxx"
index => "index1,index2,index3..."
query => '{"query":{"range":{"ingest_timestamp":{"gte":"now-3h","lte":"now/m"}}}}'
schedule => "*/10 * * * *"
scroll => "10m"
docinfo=>true
size => 1000
}
}
filter {
}
output{
elasticsearch{
hosts => ["http://es-cn-******.elasticsearch.aliyuncs.com:9200"]
user => "elastic"
password => "******"
index => "%{[@metadata][_index]}"
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
ilm_enabled => false
manage_template => false
}
}
4.验证数据完整性,比较两端同步结果数量是否差异
POST /index1/_search?
{
"query": {
"range": {
"modifyTime": {
"lte":"2023-11-28 00:00:00",
"time_zone":"+08:00",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
},
"from": 0,
"size": 0,
"sort": [],
"track_total_hits":true,
"aggs": {}
}
5.forcemerge释放磁盘空间,此过程时间较长,logstash同步最好等此操作结束后执行
在es控制台界面执行如下命令,释放磁盘空间
POST index_name/_forcemerge?only_expunge_deletes=true
6.切换域名,平滑迁移
运维同学切换域名,此处不做过多讲解
如果您对技术有兴趣,愿意友好交流,可以加v进技术群一起沟通,v:zzs1067632338,备注csdn即可