005 文档API

最新推荐文章于 2024-12-26 09:01:16 发布

转载最新推荐文章于 2024-12-26 09:01:16 发布 · 119 阅读

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/juncaoit/p/11252225.html

本文深入探讨Elasticsearch的八大核心功能，包括索引API、自动创建索引、版本控制、操作类型、自动生成ID、乐观并发控制、路由及超时设置。详细解释每个功能的使用场景和实现细节，帮助读者全面理解Elasticsearch的工作机制。

1.索引API

　　下面的请求把JSON对象添加到school索引，_doc映射下。

　　关于POST请求，如果存在索引，则更新；如果不存在，则添加。

1 POST school/_doc/1
2 {
3   "name":"tom1",
4   "sex":"M"
5 }
6 GET school/_doc/1

　　效果：

 1 {
 2   "_index" : "school",
 3   "_type" : "_doc",
 4   "_id" : "1",
 5   "_version" : 1,
 6   "_seq_no" : 0,
 7   "_primary_term" : 1,
 8   "found" : true,
 9   "_source" : {
10     "name" : "tom1",
11     "sex" : "M"
12   }
13 }

2.自动创建索引

　　Automatic index creation is controlled by the action.auto_create_index setting. This setting defaults to true, meaning that indices are always automatically created. Automatic index creation can be permitted only for indices matching certain patterns by changing the value of this setting to a comma-separated list of these patterns. It can also be explicitly permitted and forbidden by prefixing patterns in the list with a + or -. Finally it can be completely disabled by changing this setting to false.

　　意思是：

　　　　默认是true，会自动创建索引。

　　　　可以配合通配符，决定哪些配置可以被创建，哪些配置不允许被创建

　　　　可以设置false，完全禁止设置

　　测试:

 1 PUT _cluster/settings
 2 {
 3     "persistent": {
 4         "action.auto_create_index": "twitter,index10,-index1*,+ind*" 
 5     }
 6 }
 7 
 8 POST ind1/dov/1
 9 {
10   "score":"10"
11 }

　　说明：

　　　　Permit only the auto-creation of indices called twitter, index10, no other index matching index1*, and any other index matching ind*. The patterns are matched in the order in which they are given.

　　效果：

 1 #! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
 2 {
 3   "_index" : "ind1",
 4   "_type" : "dov",
 5   "_id" : "1",
 6   "_version" : 1,
 7   "result" : "created",
 8   "_shards" : {
 9     "total" : 2,
10     "successful" : 1,
11     "failed" : 0
12   },
13   "_seq_no" : 0,
14   "_primary_term" : 1
15 }

　　再执行：

1 POST /index11/doc/1
2 {
3   "score":"10"
4 }

　　效果：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [index11] and [action.auto_create_index] contains [-index1*] which forbids automatic creation of the index",
        "index_uuid" : "_na_",
        "index" : "index11"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [index11] and [action.auto_create_index] contains [-index1*] which forbids automatic creation of the index",
    "index_uuid" : "_na_",
    "index" : "index11"
  },
  "status" : 404
}

　　再次恢复默认：

1 PUT _cluster/settings
2 {
3     "persistent": {
4         "action.auto_create_index": "true" 
5     }
6 }

1 {
2   "acknowledged" : true,
3   "persistent" : {
4     "action" : {
5       "auto_create_index" : "true"
6     }
7   },
8   "transient" : { }
9 }

3.版本控制

　　ES提供了版本控制，可以通过使用版本查询参数来指定文档的特定版本。

　　内部的版本控制是默认版本，从1开始，每次更新递增，包括删除。版本号可以在外部设置，不过要启用此功能，需要将version_type设置为外部。

　　版本控制是一个实时的过程，不受实时搜索操作的影响。

　　修改过下面的信息：

1 PUT index1/_doc/1
2 {
3   "name":"tom1",
4   "sex":"M"
5 }

　　查看：

1 GET index1/_doc/1

 1 {
 2   "_index" : "index1",
 3   "_type" : "_doc",
 4   "_id" : "1",
 5   "_version" : 2,
 6   "_seq_no" : 1,
 7   "_primary_term" : 3,
 8   "found" : true,
 9   "_source" : {
10     "name" : "tom1",
11     "sex" : "M"
12   }
13 }

　　发现上面是version为2，所以，可以使用version进行过滤：

1 GET index1/_doc/1?version=2

　　效果与上面的执行结果相同。

　　关于版本version_type的功能，后续明白了再补充。

4.操作类型

　　The index operation also accepts an op_type that can be used to force a create operation, allowing for "put-if-absent" behavior. When create is used, the index operation will fail if a document by that id already exists in the index.

　　意思是：用于强制创建操作，如果存在，则操作失败，避免覆盖现有的文档.

　　我的理解是，不会再允许创建了，version不会进行叠加了，只会报错。但是如果再去掉，马上又可以进行更新掉，version进行叠加。

1 GET /_cat/indices
2 DELETE /twitter/
3 PUT twitter/_doc/1?op_type=create
4 {
5     "user" : "kimchy",
6     "post_date" : "2009-11-15T14:12:12",
7     "message" : "trying out Elasticsearch"
8 }

　　一次创建：

 1 {
 2   "_index" : "twitter",
 3   "_type" : "_doc",
 4   "_id" : "1",
 5   "_version" : 1,
 6   "result" : "created",
 7   "_shards" : {
 8     "total" : 2,
 9     "successful" : 1,
10     "failed" : 0
11   },
12   "_seq_no" : 0,
13   "_primary_term" : 1
14 }

　　再次创建：

 1 {
 2   "error": {
 3     "root_cause": [
 4       {
 5         "type": "version_conflict_engine_exception",
 6         "reason": "[1]: version conflict, document already exists (current version [1])",
 7         "index_uuid": "Cp1z9uTRRhG0wIV2XiJPpQ",
 8         "shard": "0",
 9         "index": "twitter"
10       }
11     ],
12     "type": "version_conflict_engine_exception",
13     "reason": "[1]: version conflict, document already exists (current version [1])",
14     "index_uuid": "Cp1z9uTRRhG0wIV2XiJPpQ",
15     "shard": "0",
16     "index": "twitter"
17   },
18   "status": 409
19 }

5.自动生成ID

　　The index operation can be executed without specifying the id. In such a case, an id will be generated automatically. In addition, the op_type will automatically be set to create. Here is an example (note the POST used instead of PUT)

1 ID自动生成
2 POST twitter/_doc/
3 {
4     "user" : "kimchy",
5     "post_date" : "2009-11-15T14:12:12",
6     "message" : "trying out Elasticsearch"
7 }

　　结果：

 1 {
 2   "_index" : "twitter",
 3   "_type" : "_doc",
 4   "_id" : "cI7jS2wBE-J5sxKYhB25",
 5   "_version" : 1,
 6   "result" : "created",
 7   "_shards" : {
 8     "total" : 2,
 9     "successful" : 1,
10     "failed" : 0
11   },
12   "_seq_no" : 50,
13   "_primary_term" : 1
14 }

　　说明：POST这种多次提交，id在一直变化，但是version不会变化。

　　　　　但是PUT，id不会变化，version一直在变。

6.乐观并发控制

　　Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409. See Optimistic concurrency control for more details.

　　后续学习。

7.路由

　　By default, shard placement ? or routing ? is controlled by using a hash of the document’s id value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter. For example。

　　对于分片，可以进行路由控制。

POST twitter/_doc?routing=kimchy
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

　　In the example above, the "_doc" document is routed to a shard based on the routingparameter provided: "kimchy".

　　When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

　　如果显式的设置，则routing必须写才不会报错。

8.超时

　　The primary shard assigned to perform the index operation might not be available when the index operation is executed. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the index operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error. The timeout parameter can be used to explicitly specify how long it waits. Here is an example of setting it to 5 minutes。

　　意思：索引在主分片上操作，只会等待一分钟，然后报错，可以设置参数进行显式的控制。

1 PUT twitter/_doc/1?timeout=5m
2 {
3     "user" : "kimchy",
4     "post_date" : "2009-11-15T14:12:12",
5     "message" : "trying out Elasticsearch"
6 }

转载于:https://www.cnblogs.com/juncaoit/p/11252225.html