ElasticSearch | bool 查询 | Query Content vs Filter Context

最新推荐文章于 2024-06-10 18:21:50 发布

原创最新推荐文章于 2024-06-10 18:21:50 发布 · 212 阅读

0 ·

CC 4.0 BY-SA版权

本文深入解析Elasticsearch中的Bool查询，包括must、should、must_not和filter子句的使用，以及如何通过boost调整字段权重，实现更精准的搜索结果。

bool 查询 | 复合查询

一个 bool 查询，是一个或多个查询子句的组合；
bool 查询总共包含 4 种子句，其中 2 种会影响算分，2 种不影响算分；

Query Context | 影响算分

must
should

Filter Context | 不影响算分

must_not
filter - 必须匹配

相关性算分不只是全文本检索的专利，也适用于 Yes | No 的子句，匹配的子句越多，相关性评分越高；如果多条查询子句被合并为一条复合查询语句，比如 bool 查询，则每个查询子句计算得出的评分会被合并到总的相关性评分中；

bool 查询 | 举几个栗子

0 | bool 查询 | 基本语法

POST /products/_search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "price" : "30" }
      },
      "filter": {
        "term" : { "avaliable" : "true" }
      },
      "must_not" : {
        "range" : {
          "price" : { "lte" : 10 }
        }
      },
      "should" : [
        { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },
        { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }
      ],
      "minimum_should_match" :1
    }
  }
}

1 | 多值（数组）字段的精确匹配 | must 子句 | 结果有算分

改变数据模型，增加字段，解决数组包含而不是精确匹配的问题；

POST /newmovies/_bulk
{ "index": { "_id": 1 }}
{ "title" : "Father of the Bridge Part II","year":1995, "genre":"Comedy","genre_count":1 }
{ "index": { "_id": 2 }}
{ "title" : "Dave","year":1993,"genre":["Comedy","Romance"],"genre_count":2 }

数组字段精确匹配的查询语法，通过增加 genre_count 字段的限制；

POST /newmovies/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"genre.keyword": {"value": "Comedy"}}},
        {"term": {"genre_count": {"value": 1}}}
      ]
    }
  }
}

2 | 多值（数组）字段的精确匹配 | filter 子句 | 结果无算分

POST /newmovies/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {"genre.keyword": {"value": "Comedy"}}},
        {"term": {"genre_count": {"value": 1}}}
        ]
    }
  }
}

3 | 准备数据 | blogs

DELETE blogs
POST /blogs/_bulk
{ "index": { "_id": 1 }}
{"title":"Apple iPad", "content":"Apple iPad,Apple iPad" }
{ "index": { "_id": 2 }}
{"title":"Apple iPad,Apple iPad", "content":"Apple iPad" }

4 | 通过 boost 提高某个字段的权重

对于关键词 "apple ipad" 而言，其在 2 篇文档的 2 个字段中的命中率都是 100%，区别在于，_id 为 1 的字段中，title 字段的命中率为 1/3，_id 为 2 的字段中，title 字段的命中率为 2/3；
如果不为字段指定 boost 的值，2 篇文档的算分是一样的；
如果指定了 title 的 boost 值更大，意味着 title 贡献率更大的 _id 为 2 的文档的算分会更大，反之亦然；

POST blogs/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
            "title": {
              "query": "apple,ipad",
              "boost": 1.1
            }
          }
        },
        {"match": {
            "content": {
              "query": "apple,ipad",
              "boost":2
            }
          }
        }
      ]
    }
  }
}

5 | 准备数据 | news

POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

6 | bool 查询

文档的 content 字段必须包含 "apple"；

POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match":{"content":"apple"}
      }
    }
  }
}

文档的 content 字段必须包含 "apple"，并且必须不包含 "pie"；

POST news/_search
{
  "query": {
    "bool": {
      "must": {
        "match":{"content":"apple"}
      },
      "must_not": {
        "match":{"content":"pie"}
      }
    }
  }
}

文档的 content 字段可以包含 "apple" ，也可以包含 "pie"，只不过把包含 "pie" 的算分打低一点；

POST news/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": "apple"
        }
      },
      "negative": {
        "match": {
          "content": "pie"
        }
      },
      "negative_boost": 0.5
    }
  }
}