elasticsearch索引、文档、REST API、节点、分片、文档CURD

最新推荐文章于 2023-04-21 14:47:52 发布

BinyGo

最新推荐文章于 2023-04-21 14:47:52 发布

阅读量1.3k

点赞数 1

文章标签： elasticsearch 搜索引擎全文检索

本文链接：https://blog.youkuaiyun.com/w15802080081/article/details/122129490

版权

elasticsearch 专栏收录该内容

3 篇文章

订阅专栏

索引

Index — 索引是文档的容器，是一类文档的结合
每个索引都有自己的Mapping定义，用于定义文档的字段名和字段类型，Setting定义不同的数据分布
Shard ：索引中的数据分散在shard分片上

GET movies # 获取movies索引信息

{
  "movies" : {
    "aliases" : { },
    "mappings" : {...},  # mappings定义文档字段类型
    "settings" : {       # settings定义不同的数据分布
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",  # 分片数
        "provided_name" : "movies",
        "creation_date" : "1640329971323",
        "number_of_replicas" : "1",
        "uuid" : "7nM3ZbtkQhSOZziMDlNfuQ",
        "version" : {
          "created" : "7160299"
        }
      }
    }
  }
}

抽象类比RDMS

Elasticsearch：相关性、高性能全文检索，数据聚合
RDMS：事务性，数据一致性，幂等

RDMS	Elasticsearch
Table	index
Row	Document
Column	Filed
Schema	Mapping
SQL	DSL

文档

Elasticsearch是面向文档的，文档是所有可搜索的最小单位，如：日志文件中的日志项，一个商品的具体信息，MP3播放器里的一首歌，一篇PDF的具体内容，类比数据库中的一条记录
文档会被序列化成JSON格式，保存在elasticsearch中，JSON对象由字段组成，每个字段都有对应的字段类型（字符串、数值、布尔、日期、二进制、范围类型等）
每个文档都有一个unique id，可以自己指定id值，也可以通过elasticsearch自动生成

文档的元数据

GET movies/_doc/1

#文档的元数据
{
  "_index" : "movies",  # 文档所属的索引名
  "_type" : "_doc",     # 文档所属的类型名
  "_id" : "1",          # 文档的唯一ID
  "_version" : 4,       # 文档的版本信息
  "_seq_no" : 30286,    # Sequence Number，序号，主要用来记录新增、更新和删除的操作顺序，对数据而言，通过这个字段可以确定数据创建顺序
  "_primary_term" : 3,  #
  "found" : true,
  "_source" : {         # 数据源
    "id" : "1",
    "year" : 1995,
    "@version" : "1",   # 文档的版本信息
    "title" : "Toy Story",
    "genre" : [
      "Adventure",
      "Animation",
      "Children",
      "Comedy",
      "Fantasy"
    ]
  }
}

索引、文档相关REST API接口

#查看索引相关信息
GET movies

#查看索引的文档总数
GET movies/_count

#查看前10条文档，了解文档格式
POST movies/_search
{
}

#_cat indices API
#查看indices
GET /_cat/indices/kibana*?v&s=index

#查看状态为绿的索引
GET /_cat/indices?v&health=green

#按照文档个数排序
GET /_cat/indices?v&s=docs.count:desc

#查看具体的字段
GET /_cat/indices/kibana*?pri&v&h=health,index,pri,rep,docs.count,mt

#How much memory is used per index?
GET /_cat/indices?v&h=i,tm&s=tm:desc

分布式高可用

服务可用性 – 允许有节点停止服务
数据可用性–部分节点丢失，不会丢失数据
可扩展性：请求量升高/数据不断增长（将数据分布到所有节点），可水平扩容

节点

节点就是一个elasticsearch的实例，本质上就是一个Java进程，一台机器上可运行多个节点，生产环境一般建议一台机器只运行一个elasticsearch实例

节点类型	描述
Master-eligible node	默认节点，可参加选主流程，成为Master节点
Master node	第一个启动的节点，会将自己选举成Master节点，只有Master节点才能修改集群的状态信息，包含所有节点信息，所有索引和其相关的Mapping与Setting信息，分片的路由信息
Data node	保存数据的节点，负责保存分片数据，在数据扩展上起到至关重要的作用
Coordinating node	负责接受Client的请求，将请求分发到合适的节点，最终把结果汇集到一起，每个节点都默认起到Coordinating node的职责
Hot & Warm node	Hot热节点一般配置较高，放置热数据, Warm放置普通数据或历史数据，配置要求相对低
Machine Learning node	专门负责跑机器学习的任务，用来做异常检测

配置节点类型
开发环境中一个节点可以承担多种角色
生产环境中，应该设置单一的角色节点（dedicated node）

节点类型	配置参数	默认值
Master eligible	node.master	true
data	node.data	true
ingest	node.ingest	true
Coordination only	无	每个节点默认都是coordinating节点
Machine learing	node.ml	true（需enable x-pack）

分片

主分片（Primary Shard）：用以解决水平扩展的问题。通过主分片，可以将数据分布到集群的所有节点上，一个分片就是一个运行的实例，主分片数在索引创建时指定，后续不允许修改，除非Reindex
副本（Replica Shard）：用以解决数据高可用问题。副本分片是主分片的拷贝，副本分片数可动态调整，增加副本数，还可以一定程度上提高服务的可用性（读取的吞吐）

生产环境中分片需提前做好规划
分片数量过小：会导致后续无法增加节点实现水平扩展，单个分片数据量过大，会导致数据重新分配耗时。
分配数量过大：影响搜索结果的相关性打分，影响统计结果的准确性，单个节点上分片过多，会导致资源浪费，同时也影响性能。7.0版开始，默认主分片设置为1，解决over-sharding的问题。

节点、集群、分片相关REST API接口

#节点
get _cat/nodes?v
GET /_nodes/es7_01,es7_02
GET /_cat/nodes?v
GET /_cat/nodes?v&h=id,ip,port,v,m
#集群
GET _cluster/health
GET _cluster/health?level=shards
GET /_cluster/health/kibana_sample_data_ecommerce,kibana_sample_data_flights
GET /_cluster/health/kibana_sample_data_flights?level=shards
#### cluster state
The cluster state API allows access to metadata representing the state of the whole cluster. This includes information such as
GET /_cluster/state
#cluster get settings
GET /_cluster/settings
GET /_cluster/settings?include_defaults=true
#分片
GET _cat/shards
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason

文档的基本 CRUD 与批量操作


Index	PUT index_name/_doc/1 {“user”:“biny”,“message”:“hellow”}
Create	PUT index_name/_create/1 {“user”:“biny”,“message”:“hellow”}
Create	POST index_name/_doc(不指定id，自动生成) {“user”:“biny”,“message”:“hellow”}
Read	GET index_name/_doc/1 找不到文返回404
Update	POST index_name/_update/1 {“doc”:{“user”:“biny”,“message”:“hellow”}}
Delete	DELETE index_name/1

Index操作如果id不存在，创建新的文档，否则，先删除现有文档，再创建新的文档，版本信息+1
Create操作如果id已存在，会创建失败，注意与Index区别
Update操作，文档必须已存在，更新只会对相应字段做增量修改

############Create Document############
#create document. 自动生成 _id
POST users/_doc
{
	"user" : "Mike",
    "post_date" : "2019-04-15T14:12:12",
    "message" : "trying out Kibana"
}

#create document. 指定Id。如果id已经存在，报错
PUT users/_doc/1?op_type=create
{
    "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

#create document. 指定 ID 如果已经存在，就报错
PUT users/_create/1
{
     "user" : "Jack",
    "post_date" : "2019-05-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

### Get Document by ID
#Get the document by ID
GET users/_doc/1

###  Index & Update
#Update 指定 ID  (先删除，在写入)
GET users/_doc/1

PUT users/_doc/1
{
	"user" : "Mike"

}

#GET users/_doc/1
#在原文档上增加字段
POST users/_update/1/
{
    "doc":{
        "post_date" : "2019-05-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }
}

### Delete by Id
# 删除文档
DELETE users/_doc/1

### Bulk 操作
#执行两次，查看每次的结果

#管道执行第1次，每条命令都会返回执行结果，单条错误不影响其他命令，没有事务
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }


#管道执行第2次，每条命令都会返回执行结果，单条错误不影响其他命令，没有事务
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test2", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

### mget 操作
GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1"
        },
        {
            "_index" : "test",
            "_id" : "2"
        }
    ]
}

#URI中指定index
GET /test/_mget
{
    "docs" : [
        {

            "_id" : "1"
        },
        {

            "_id" : "2"
        }
    ]
}

GET /_mget
{
    "docs" : [
        {
            "_index" : "test",
            "_id" : "1",
            "_source" : false
        },
        {
            "_index" : "test",
            "_id" : "2",
            "_source" : ["field3", "field4"]
        },
        {
            "_index" : "test",
            "_id" : "3",
            "_source" : {
                "include": ["user"],
                "exclude": ["user.location"]
            }
        }
    ]
}

### msearch 操作
POST kibana_sample_data_ecommerce/_msearch
{}
{"query" : {"match_all" : {}},"size":1}
{"index" : "kibana_sample_data_flights"}
{"query" : {"match_all" : {}},"size":2}

### 清除测试数据
#清除数据
DELETE users
DELETE test
DELETE test2

常见错误返回

问题	原因
无法连接	网络故障或集群挂了
连接无法关闭	网络故障或节点出错
429	集群过于繁忙
4xx	请求体格式有错误
500	集群内部错误