[elasticsearch笔记] 理论

最新推荐文章于 2023-05-20 15:42:23 发布

原创最新推荐文章于 2023-05-20 15:42:23 发布 · 138 阅读

0 ·

CC 4.0 BY-SA版权

elasticsearch 同时被 2 个专栏收录

68 篇文章

订阅专栏

笔记

37 篇文章

订阅专栏

本文深入探讨Elasticsearch的关键概念，如ClusterHealth、文档、字段、分片等，解析其工作原理，包括写操作、读操作流程，以及批量处理、加载测试数据的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

keyword
相关重点
加载测试数据
Cluster Health
create, index, query, delete
Batch Processing

注意：本文档不会维护，可查看持续维护文档 https://blog.youkuaiyun.com/weixin_43834662/article/details/102781208

keyword

analysis：分词，依赖于 analyzer分词器
analyzer：分词器
cluster：集群
cross-cluster replication (CCR)：跨集群复制，同网络数据迁移非常快速，但是在遇到灾难时，数据安全性没法保障。为了应对这个问题，采用CCR。CCR is active-passive. 主集群负责读、写。复本集群只负责读
cross-cluster search (CCS)：跨集群搜索
document：文档
field：字段
filter：过滤，结果只有 yes or no，没有score
follower index：CCR跨集群复制的目标索引。存在于本地集群，复制 leader indices
leader indices：CCR跨集群复制的源索引。存在于远端，被复制到 follower indices
mapping：映射，定义type和相关设置
node：集群中运行的实例，多个node可以在一台机器上运行，但通常情况是一台server一个node
id：文档id，可自定义或ES自动生成
index：索引（名词），类似数据库名（database）
shard：一个shard是一个单独的Lucene实例。是一个级别的工作单元，由ES自动管理。ES负责把所有的shard分布到集群所有节点中，同时负责在节点发生变更时移动分片。在单个ES shard（Lucene Index）中有最大document数限制，每个分片能存储的文档数是由 Lucene决定的，约21亿（=Integer.MAX_VALUE - 128）
primary shard：主分片，每个文档都被存于一个主分片中。当索引文档是，先在主分片索引，然后在所有复分片索引。所有修改都在主分片，该主分片负责同步数据到副分片
replica shard：复分片，每个主分片有0个或者多个复分片，一个复分片是主分片的复本，主要作用有：容错；提高get和search性能
query：两种类型的query：scoring queries 和 filters
recovery：从一个源分片同步一个分片复本的过程
routing：路由，一个文档存于一个主分片。哪个主分片？由 routing 而来。默认是通过 ID routing
souce field：默认情况下，我们索引的JSON文档，被保存于 _source字段中
term：词，被索引到ES中的一个准确的值。foo，Foo，FOO是不同的term
text：常见的非结构化的文本
type：类似于数据库表名（table），6.X deprecated，7.X已经不支持了，只能是 _doc
inverted index：倒排索引，搜索中最核心的概念，是其可以进行快速搜索的原因
ILM：index lifecycle management，索引生命周期管理。详情

加载测试数据

accounts.json, 格式如下：

{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}

导入数据

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

Cluster Health

GET _cat/health
GET _cluster/health
GET _cat/nodes
GET _cat/shards
GET _cat/indices

create, index, query, delete

PUT /customer?pretty

PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}

POST /customer/_doc?pretty
{
  "name": "Jane Doe"
}

POST /customer/_update/1?pretty
{
  "doc": { "name": "Jane Doe", "age": 20 }
}

POST /customer/_update/1?pretty
{
  "script" : "ctx._source.age += 5"
}

GET /customer/_doc/1?pretty

GET /customer/_search
{
  "query": {
    "match_all": {}
  }
}
DELETE /customer/_doc/1?pretty

DELETE /customer?pretty

Batch Processing

当个操作失败，还是会执行其他操作，整个请求结果正常返回，每个操作都会有对应的操作结果（按请求排序）。

POST /customer/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}