Elasticsearch——Search API详解

码炫课堂-码哥

于 2024-07-30 07:35:42 发布

阅读量1.6k

点赞数 14

CC 4.0 BY-SA版权

分类专栏： elasticsearch专题文章标签： elasticsearch 搜索引擎

本文链接：https://blog.youkuaiyun.com/smart_an/article/details/140786323

作者简介：大家好，我是smart哥，前中兴通讯、美团架构师，现某互联网公司CTO

联系qq：184480602，加我进群，大家一起学习，一起进步，一起对抗互联网寒冬

学习必须往深处挖，挖的越深，基础越扎实！

阶段1、深入多线程

 阶段2、深入多线程设计模式

 阶段3、深入juc源码解析

阶段4、深入jdk其余源码解析

阶段5、深入jvm源码解析

码哥源码部分

码哥讲源码-原理源码篇【2024年最新大厂关于线程池使用的场景题】

码哥讲源码【炸雷啦！炸雷啦！黄光头他终于跑路啦！】

码哥讲源码-【jvm课程前置知识及c/c++调试环境搭建】

码哥讲源码-原理源码篇【揭秘join方法的唤醒本质上决定于jvm的底层析构函数】

码哥源码-原理源码篇【Doug Lea为什么要将成员变量赋值给局部变量后再操作？】

码哥讲源码【你水不是你的错,但是你胡说八道就是你不对了！】

码哥讲源码【谁再说Spring不支持多线程事务，你给我抽他！】

终结B站没人能讲清楚红黑树的历史，不服等你来踢馆！

打脸系列【020-3小时讲解MESI协议和volatile之间的关系，那些将x86下的验证结果当作最终结果的水货们请闭嘴】

Search API

搜索API(_search)允许用来执行搜索查询并返回匹配的结果。可以使用简单查询字符串作为参数提供查询(URI形式)，也可以使用请求正文(body形式)。大多数搜索API都是支持多索引的， Explain API除外(用于调试性能)。

    //语法
    GET /<index>/_search
    GET /_search
    POST /<index>/_search
    POST /_search

所有搜索API都支持跨索引机制，并支持多索引语法。例如，搜索twitter索引中的所有文档:

    GET /twitter/_search?q=user:kimchy

还可以在多个索引中搜索具有特定标记的所有文档，例如当每个用户有一个索引时:

    GET /kimchy,Elasticsearch/_search?q=tag:now

或者使用_all搜索所有可用索引:

    GET /_all/_search?q=tag:now

为了确保快速响应，如果一个或多个分片失败，搜索API将以部分结果响应。

1、URI模式

通过提供请求参数，可以纯粹使用URI执行搜索请求。在使用此模式执行搜索时，并非所有搜索选项都可用，但对于快速的“测试” 来说，它非常方便。

    GET twitter/_search?1=user:kimchy

URI搜索模式支持的参数如下表：

2、Body模式

搜索请求可以在请求正文中使用Query DSL:

    GET /twitter/_search
    {
        "query" : {
            "term" : { "user" : "kimchy" }
        }
    }

Body搜索模式支持的参数如下表：

注意 : search_type 、 request_cache 和 allow_partial_search_results必须作为查询字符串参数传递(不能放在body里面，要放在URL里面)。搜索请求的其余部分应该在主体内部传递。正文内容也可以作为名为source的REST参数传递。HTTP GET 和HTTP POST都可以用于执行带Body的搜索。

terminate_after始终在post_filter之后应用，并在分片上收集到足够的命中结果后停止查询和聚合。聚合上的文档计数可能不会反映响应中的hits.total，因为聚合是在post_filter之前应用的。

如果只需要知道是否有任何文档匹配特定的查询，可以将size设置为0，以表示对搜索结果不感兴趣。此外，还可以将 terminate_after设置为1，以指示只要找到第一个匹配的文档(每个分片)，就可以终止查询执行。示例如下:

    GET /_search?q=message:number&size=0&terminate_after=1

    {
      "took": 3,
      "timed_out": false,
      "terminated_early": true,
      "_shards": {
        "total": 1,
        "successful": 1,
        "skipped" : 0,
        "failed": 0
      },
      "hits": {
        "total" : {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
      }
    }

可以看到，响应结果中不包含任何文档，因为size设置为0。 hits.total将等于0，表示没有匹配的文档，或者大于0，表示在提前终止查询时，至少有多个文档匹配此查询。此外，如果查询提前终止，则在响应中将terminated_early标志设置为true。

响应中所用的时间took是处理此请求所用的毫秒数，从节点收到查询后快速开始，直到完成所有与搜索相关的工作，然后再将上述 JSON返回到客户机。这意味着它包括在线程池中等待、在整个集群中执行分布式搜索以及收集所有结果所花费的时间。

2.1、Explain参数

Explain参数是Elasticsearch提供的辅助API，经常不为人所知和所用。Explain参数用来帮助分析文档的相关性分数是如何计算出来的。

    GET /test/_search
    {
      "explain": true,
      "query": {
        "match": {
          "text": "hello"
        }
      }
    }

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.2876821,
        "hits" : [
          {
            "_shard" : "[test][0]",
            "_node" : "Cc6ARDA6TY-poOdtxvsA6g",
            "_index" : "test",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.2876821,
            "_source" : {
              "text" : "hello world",
              "flag" : "foo"
            },
            "_explanation" : {
              "value" : 0.2876821,
              "description" : "weight(text:hello in 0) [PerFieldSimilarity], result of:",
              "details" : [
                {
                  "value" : 0.2876821,
                  "description" : "score(freq=1.0), computed as boost * idf * tf from:",
                  "details" : [
                    {
                      "value" : 2.2,
                      "description" : "boost",
                      "details" : [ ]
                    },
                    {
                      "value" : 0.2876821,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 1,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 1,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },
                    {
                      "value" : 0.45454544,
                      "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                      "details" : [
                        {
                          "value" : 1.0,
                          "description" : "freq, occurrences of term within document",
                          "details" : [ ]
                        },
                        {
                          "value" : 1.2,
                          "description" : "k1, term saturation parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 0.75,
                          "description" : "b, length normalization parameter",
                          "details" : [ ]
                        },
                        {
                          "value" : 2.0,
                          "description" : "dl, length of field",
                          "details" : [ ]
                        },
                        {
                          "value" : 2.0,
                          "description" : "avgdl, average length of field",
                          "details" : [ ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          }
        ]
      }
    }

结果形式上比较复杂，里面最重要的内容就是对文档计算得到的总分以及总分的计算过程。如果总分等于0，则该文档将不能匹配给定的查询。另一个重要内容是关于不同打分项的描述信息，根据查询类型的不同，打分项会以不同方式对最后得分产生影响。

2.2、折叠结果（collapse）

允许基于字段值折叠(collapse)搜索结果。折叠是通过每个折叠键仅选择顶部排序的文档来完成的。其实就是按照某个字段分组，每个分组只取一条结果。例如，下面的查询示例为每个用户检索最好的tweet，并按喜欢的次数(likes字段)对其进行排序。

    GET /twitter/_search
    {
        "query": {
            "match": {
                "message": "elasticsearch"
            }
        },
        "collapse" : {
            "field" : "user" 
        },
        "sort": ["likes"], 
        "from": 10 
    }

注意:响应中的命中total指示匹配文档的数量，是非折叠的结果。非重复组(折叠后的数量)的总数是未知的。

用于折叠的字段必须是单值keyword或数字numeric字段，而且 doc_values属性开启。

示例：根据age字段折叠结果集，按balance倒序排序

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "street"
        }
      },
      "collapse": {
        "field": "age"
      },
      "sort": [
        {
          "balance": {
            "order": "desc"
          }
        }
      ],
      "from": 0,
      "size": 2
    }

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "854",
            "_score" : null,
            "_source" : {
              "account_number" : 854,
              "balance" : 49795,
              "firstname" : "Jimenez",
              "lastname" : "Barry",
              "age" : 25,
              "gender" : "F",
              "address" : "603 Cooper Street",
              "employer" : "Verton",
              "email" : "jimenezbarry@verton.com",
              "city" : "Moscow",
              "state" : "AL"
            },
            "fields" : {
              "age" : [
                25
              ]
            },
            "sort" : [
              49795
            ]
          },
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "926",
            "_score" : null,
            "_source" : {
              "account_number" : 926,
              "balance" : 49433,
              "firstname" : "Welch",
              "lastname" : "Mcgowan",
              "age" : 21,
              "gender" : "M",
              "address" : "833 Quincy Street",
              "employer" : "Atomica",
              "email" : "welchmcgowan@atomica.com",
              "city" : "Hampstead",
              "state" : "VT"
            },
            "fields" : {
              "age" : [
                21
              ]
            },
            "sort" : [
              49433
            ]
          }
        ]
      }
    }

2.2.1、展开折叠结果

可以使用inner_hits选项展开每个折叠的顶部结果：

    GET /bank/_search
    {
      "query": {
        "match": {
          "address": "street"
        }
      },
      "collapse": {
        "field": "age",
        "inner_hits": {
          "name":"inner_list",
          "size":2,
          "sort":[{"balance":{"order":"desc"}}]
        },
        "max_concurrent_group_searches": 4
      },
      "sort": [
        {
          "balance": {
            "order": "desc"
          }
        }
      ],
      "from": 0,
      "size": 1
    }

name：响应中每个组内部展开结果使用的名称
size：每个折叠键要检索的结果数，也就是每组返回多少个结果。
sort：如何对每组中的文档进行排序。
max_concurrent_group_searches：允许每个组检索内部结果的并发请求数。

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 385,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [
          {
            "_index" : "bank",
            "_type" : "_doc",
            "_id" : "854",
            "_score" : null,
            "_source" : {
              "account_number" : 854,
              "balance" : 49795,
              "firstname" : "Jimenez",
              "lastname" : "Barry",
              "age" : 25,
              "gender" : "F",
              "address" : "603 Cooper Street",
              "employer" : "Verton",
              "email" : "jimenezbarry@verton.com",
              "city" : "Moscow",
              "state" : "AL"
            },
            "fields" : {
              "age" : [
                25
              ]
            },
            "sort" : [
              49795
            ],
            "inner_hits" : {
              "inner_list" : {
                "hits" : {
                  "total" : {
                    "value" : 16,
                    "relation" : "eq"
                  },
                  "max_score" : null,
                  "hits" : [
                    {
                      "_index" : "bank",
                      "_type" : "_doc",
                      "_id" : "854",
                      "_score" : null,
                      "_source" : {
                        "account_number" : 854,
                        "balance" : 49795,
                        "firstname" : "Jimenez",
                        "lastname" : "Barry",
                        "age" : 25,
                        "gender" : "F",
                        "address" : "603 Cooper Street",
                        "employer" : "Verton",
                        "email" : "jimenezbarry@verton.com",
                        "city" : "Moscow",
                        "state" : "AL"
                      },
                      "sort" : [
                        49795
                      ]
                    },
                    {
                      "_index" : "bank",
                      "_type" : "_doc",
                      "_id" : "835",
                      "_score" : null,
                      "_source" : {
                        "account_number" : 835,
                        "balance" : 46558,
                        "firstname" : "Glover",
                        "lastname" : "Rutledge",
                        "age" : 25,
                        "gender" : "F",
                        "address" : "641 Royce Street",
                        "employer" : "Ginkogene",
                        "email" : "gloverrutledge@ginkogene.com",
                        "city" : "Dixonville",
                        "state" : "VA"
                      },
                      "sort" : [
                        46558
                      ]
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }

还可以为每个折叠组定义不同的展开请求参数。当希望获得折叠的多个表示形式时，这很有用。示例如下:

    GET /twitter/_search
    {
        "query": {
            "match": {
                "message": "elasticsearch"
            }
        },
        "collapse" : {
            "field" : "user",