Elasticsearch 搜索与查询（Query DSL）详解

Elasticsearch Query DSL 全面详解

最新推荐文章于 2025-09-09 09:44:24 发布

原创最新推荐文章于 2025-09-09 09:44:24 发布 · 1.1k 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#elasticsearch #jenkins #大数据

搜索引擎专栏收录该内容

55 篇文章

订阅专栏

Elasticsearch 搜索与查询（Query DSL） 是其最强大、最灵活的核心功能之一。它基于 JSON 的领域特定语言（DSL），支持从简单关键词搜索到复杂布尔逻辑、聚合、脚本评分等高级查询。

本文将全面详解 Query DSL 的语法、类型、使用场景与最佳实践。

一、什么是 Query DSL？

Query DSL（Domain Specific Language） 是 Elasticsearch 提供的一套基于 JSON 的查询语言，用于定义搜索条件。它比简单的 URL 参数查询更强大，支持：

全文搜索
结构化过滤
布尔组合
相关性评分（Relevance Scoring）
高亮、排序、分页

✅ 查询 DSL 主要用于 /_search API。

二、Query DSL 基本结构

GET /index/_search
{
  "query": { ... },           // 查询主体
  "from": 0,                  // 分页起始
  "size": 10,                 // 每页数量
  "sort": [ ... ],            // 排序
  "_source": [ ... ],         // 返回字段
  "highlight": { ... },       // 高亮
  "aggs": { ... }             // 聚合分析
}

核心是 "query" 部分，决定哪些文档匹配。

三、Query 与 Filter：两种上下文

1. Query Context（查询上下文）

关注 “文档与查询的相关性”
计算 _score（评分）
用于全文搜索

{
  "match": {
    "title": "Elasticsearch 教程"
  }
}

2. Filter Context（过滤上下文）

关注 “是否匹配”
不计算 _score，性能更高
可被缓存（bitset）

{
  "term": {
    "status": "published"
  }
}

✅ 最佳实践：

过滤条件优先使用 filter；
在 bool 查询中区分 must（query）和 filter。

四、常用查询类型详解

1. `match` —— 全文匹配（最常用）

对 text 字段进行分词后匹配。

{
  "match": {
    "title": "搜索 引擎"
  }
}

会分词为 ["搜索", "引擎"]
匹配包含任一词的文档
支持 operator: "and" 要求全部匹配

"match": {
  "title": {
    "query": "搜索 引擎",
    "operator": "and"
  }
}

2. `term` / `terms` —— 精确匹配

用于 keyword、number、boolean 等字段。

{
  "term": {
    "status": "published"
  }
}

{
  "terms": {
    "tags": ["tech", "elasticsearch", "tutorial"]
  }
}

⚠️ term 不分词，match 会分词。

3. `range` —— 范围查询

支持数值、日期、IP。

{
  "range": {
    "price": {
      "gte": 100,
      "lte": 500
    }
  }
}

{
  "range": {
    "created_at": {
      "gte": "2024-01-01",
      "lt": "2024-07-01"
    }
  }
}

支持：gt, gte, lt, lte

4. `bool` —— 布尔组合（最核心）

组合多个查询条件，支持 must, filter, should, must_not。

{
  "bool": {
    "must": [
      { "match": { "title": "Elasticsearch" } }
    ],
    "filter": [
      { "term": { "status": "published" } },
      { "range": { "price": { "gte": 100 } } }
    ],
    "must_not": [
      { "term": { "tags": "draft" } }
    ],
    "should": [
      { "term": { "tags": "featured" } }
    ],
    "minimum_should_match": 1
  }
}

✅ 几乎所有复杂查询都基于 bool。

5. `wildcard` —— 通配符查询

支持 * 和 ?，但性能较差，慎用。

{
  "wildcard": {
    "email": "zhang*.com"
  }
}

建议：对 keyword 字段使用，避免在 text 上使用。

6. `prefix` —— 前缀查询

比 wildcard 更高效。

{
  "prefix": {
    "title": "Elasti"
  }
}

适用于自动补全、搜索建议。

7. `fuzzy` —— 模糊查询（拼写纠错）

允许一定编辑距离的匹配。

{
  "fuzzy": {
    "title": {
      "value": "Elascticsearch",
      "fuzziness": "AUTO"
    }
  }
}

fuzziness: 0, 1, 2, AUTO
用于容错搜索。

8. `regexp` —— 正则表达式

功能强大但性能低，慎用。

{
  "regexp": {
    "tags": "te[ch]+"
  }
}

9. `match_phrase` —— 短语匹配

要求词序一致且连续。

{
  "match_phrase": {
    "title": "分布式搜索引擎"
  }
}

分词后必须连续出现；

可用 slop 允许间隔：

"slop": 2  // 允许最多 2 个词的间隔

10. `multi_match` —— 多字段搜索

在多个字段中搜索同一关键词。

{
  "multi_match": {
    "query": "Elasticsearch",
    "fields": ["title^3", "content", "tags"]
  }
}

^3 表示 title 字段权重更高。

11. `nested` —— 嵌套对象查询

用于查询 nested 类型字段。

{
  "nested": {
    "path": "comments",
    "query": {
      "bool": {
        "must": [
          { "match": { "comments.user": "张三" } },
          { "match": { "comments.content": "很好" } }
        ]
      }
    },
    "score_mode": "avg"
  }
}

score_mode: avg, sum, none, max

12. `exists` / `missing` —— 字段存在性

{
  "exists": {
    "field": "tags"
  }
}

{
  "bool": {
    "must_not": {
      "exists": { "field": "deprecated" }
    }
  }
}

五、高级查询功能

1. `function_score` —— 自定义评分

调整文档的相关性评分。

{
  "function_score": {
    "query": { "match": { "title": "Elasticsearch" } },
    "functions": [
      {
        "field_value_factor": {
          "field": "sales_count",
          "factor": 0.01,
          "modifier": "log1p"
        }
      },
      {
        "gauss": {
          "publish_date": {
            "scale": "7d",
            "offset": "1d",
            "decay": 0.5
          }
        }
      }
    ],
    "boost_mode": "multiply"
  }
}

用于实现“销量加权”、“时间衰减”等排序策略。

2. `script_score` —— 脚本评分

使用 Painless 脚本动态计算评分。

"script_score": {
  "script": {
    "source": "doc['price'].value / doc['rating'].value"
  }
}

用于复杂排序逻辑。

六、查询优化建议 ✅

场景	建议
精确匹配	用 `term` 而不是 `match`
过滤条件	放入 `bool.filter`，不计算评分
避免通配符	少用 `wildcard`、`regexp`
大文本字段	`index: false` 或使用 `keyword`
分页深度	避免 `from + size > 10000`，改用 `search_after`
高亮性能	限制 `fragment_size` 和 `number_of_fragments`

七、完整查询示例：电商商品搜索

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "multi_match": { "query": "iPhone 手机", "fields": ["title^3", "subtitle"] } }
      ],
      "filter": [
        { "term": { "status": "published" } },
        { "range": { "price": { "gte": 500000 } } },
        { "nested": {
            "path": "attributes",
            "query": {
              "bool": {
                "must": [
                  { "term": { "attributes.name": "颜色" } },
                  { "term": { "attributes.value": "白色" } }
                ]
              }
            }
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 20,
  "sort": [
    { "sales_count": { "order": "desc" } },
    { "_score": { "order": "desc" } }
  ],
  "_source": ["title", "price", "image"],
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

八、总结：Query DSL 核心要点

类型	用途	是否评分
`match`	全文搜索	✅
`term`	精确匹配	❌（filter）
`range`	范围查询	❌（filter）
`bool`	条件组合	✅/❌
`nested`	嵌套查询	✅
`function_score`	自定义排序	✅