Elasticdump 详解

最新推荐文章于 2025-10-10 07:30:59 发布

原创最新推荐文章于 2025-10-10 07:30:59 发布 · 814 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#Elasticsearch #dump #工具 #学习

搜索引擎专栏收录该内容

55 篇文章

订阅专栏

Elasticdump 是一个用于在 Elasticsearch 集群之间迁移数据和索引结构的命令行工具。它由 Taskrabbit 开发并开源，支持将数据、映射（mappings）、设置（settings）等从一个 Elasticsearch 实例导出到文件或另一个 Elasticsearch 实例。

Elasticdump 基于 Node.js 开发，使用简单，支持多种传输方式（文件、标准输入/输出、远程 ES 节点等），适合用于备份、迁移、同步等场景。

一、Elasticdump 的核心功能

导出（Export）：
- 将索引的 数据（documents）、映射（mappings）、设置（settings） 导出为 JSON 文件。
导入（Import）：
- 将 JSON 文件或远程 ES 数据导入到目标 Elasticsearch 实例。
支持多种传输方式：
- 文件（file）
- 标准输入/输出（stdin/stdout）
- 远程 Elasticsearch 节点（http/https）
支持多个索引：
- 可以一次操作多个索引（使用通配符 *）。
支持分批处理：
- 通过 --limit 控制每次读取/写入的文档数量，避免内存溢出。

二、安装 Elasticdump

npm install -g elasticdump

注意：需要先安装 Node.js 和 npm。

三、基本语法

elasticdump \
  --input=<source> \
  --output=<destination> \
  [options]

--input：源地址（可以是文件路径、Elasticsearch URL、stdin）
--output：目标地址（可以是文件路径、Elasticsearch URL、stdout）
[options]：可选参数，如数据类型、分页大小等

四、常用参数详解

参数	说明
`--input`	源地址，如 `http://localhost:9200/my_index` 或 `/data/my_index.json`
`--output`	目标地址，同上
`--type`	导出/导入类型： `data`：文档数据 `mapping`：索引结构 `analyzer`：分析器 `settings`：索引设置 `alias`：别名
`--limit`	每次读取/写入的文档数量，默认 100
`--offset`	跳过前 N 个文档（用于分页）
`--timeout`	HTTP 超时时间（毫秒），默认 60000
`--scroll`	Scroll 查询持续时间，默认 `60s`
`--bulk`	使用 Elasticsearch 的 Bulk API 提高写入效率（推荐）
`--searchBody`	自定义查询语句（如只导出满足条件的数据）
`--headers`	添加 HTTP 请求头（如认证信息）
`--concurrency`	并发请求数，默认 1
`--ignore-errors`	忽略错误继续执行
`--fs-compress`	压缩输出文件（gzip 格式）

五、使用示例

1. 导出索引数据到文件

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=/data/my_index_data.json \
  --type=data

2. 导出索引映射（结构）

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=/data/my_index_mapping.json \
  --type=mapping

3. 导出索引设置

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=/data/my_index_settings.json \
  --type=settings

4. 从文件导入数据到 Elasticsearch

elasticdump \
  --input=/data/my_index_data.json \
  --output=http://target-host:9200/my_index \
  --type=data

5. 直接从一个 ES 实例迁移到另一个（不经过文件）

elasticdump \
  --input=http://source-host:9200/my_index \
  --output=http://target-host:9200/my_index \
  --type=data

6. 使用管道（pipe）结合多个操作

# 导出 mapping 并直接导入
elasticdump \
  --input=http://localhost:9200/my_index \
  --output=$ \
  --type=mapping | \
elasticdump \
  --input=$ \
  --output=http://target:9200/my_index \
  --type=mapping

$ 表示 stdin/stdout。

7. 导出包含认证信息的索引（如 Basic Auth）

elasticdump \
  --input=http://user:pass@source:9200/my_index \
  --output=/data/my_index.json \
  --type=data

或使用 --headers：

elasticdump \
  --input=http://source:9200/my_index \
  --output=/data/my_index.json \
  --type=data \
  --headers='{"Authorization": "Basic base64encoded"}'

8. 只导出满足条件的数据（使用 searchBody）

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=/data/my_index_filtered.json \
  --type=data \
  --searchBody='{"query": {"match": {"status": "active"}}}'

9. 批量迁移多个索引（使用通配符）

elasticdump \
  --input=http://localhost:9200/log-* \
  --output=http://backup:9200/ \
  --type=data

注意：目标端会自动创建对应索引名。

六、高级技巧

1. 压缩备份文件

elasticdump \
  --input=http://localhost:9200/my_index \
  --output=/data/my_index.json \
  --type=data \
  --fs-compress

生成 .gz 压缩文件。

2. 分批导出大数据索引

# 第一批
elasticdump \
  --input=http://localhost:9200/huge_index \
  --output=/data/batch1.json \
  --limit=1000 \
  --offset=0

# 第二批
elasticdump \
  --input=http://localhost:9200/huge_index \
  --output=/data/batch2.json \
  --limit=1000 \
  --offset=1000

3. 提高性能：使用 bulk 和并发

elasticdump \
  --input=http://source:9200/my_index \
  --output=http://target:9200/my_index \
  --type=data \
  --bulk=true \
  --concurrency=5 \
  --limit=1000

七、注意事项

版本兼容性：
- 不同版本 Elasticsearch 的 mapping 或 settings 可能不兼容，建议版本一致。
大索引处理：
- 使用 --limit 和 --bulk 避免内存溢出或超时。
网络稳定性：
- 跨网络迁移时，确保网络稳定，可配合 --timeout 调整。
权限问题：
- 如果 ES 启用了安全认证（如 X-Pack），需提供用户名密码或 token。
别名处理：
- --type=alias 可导出别名，但需注意别名指向的索引是否存在。

八、替代工具对比

工具	优点	缺点
Elasticdump	简单易用，支持多种格式，适合小到中型数据迁移	不适合超大规模数据（TB级），无断点续传
Elasticsearch Snapshot/Restore	官方推荐，支持增量备份、快照存储（S3、HDFS等）	需要配置共享存储，操作较复杂
Logstash	支持复杂转换、过滤、管道	配置复杂，资源消耗大
reindex API	内部集群迁移高效	仅限 Elasticsearch 内部，不支持导出文件