Elasticsearch 文档搜索完全指南：从基础查询到高级分析的深度实战-优快云博客

Elasticsearch 文档搜索完全指南：从基础查询到高级分析的深度实战

作者：IT之一小佬
发布日期：2025年10月23日
阅读时间：30分钟
适合人群：Elasticsearch 开发者、数据分析师、SRE、搜索产品经理

🌟 引言：为什么 Elasticsearch 是搜索的王者？

当你在电商网站搜索“红色连衣裙”，或在日志平台查找“error 500”，背后往往是 Elasticsearch 在驱动。

它之所以强大，是因为它不仅是数据库，更是：

✅ 分布式倒排索引引擎
✅ 实时全文搜索引擎
✅ 高级聚合与分析平台
✅ 支持复杂相关性打分（Scoring）

本教程将带你从零开始，深入掌握 Elasticsearch 文档搜索的 7 大核心模块，涵盖语法、原理、性能优化与实战技巧。

一、搜索基础：`_search` API 入门

最简单的搜索

GET /products/_search?q=name:iphone

q= 参数使用 简易查询字符串语法
等价于：

GET /products/_search
{
  "query": {
    "query_string": {
      "query": "name:iphone"
    }
  }
}

标准 DSL 搜索结构

GET /products/_search
{
  "query": { ... },        // 查询条件（必须）
  "from": 0,               // 分页起始（默认 0）
  "size": 10,              // 返回数量（默认 10）
  "sort": [ ... ],         // 排序
  "_source": [ ... ],      // 返回字段
  "aggs": { ... }          // 聚合分析
}

二、核心查询类型详解

1. 全文查询（Full Text Queries）

适用于文本字段，支持分词、模糊匹配。

`match` 查询

{
  "query": {
    "match": {
      "description": "wireless bluetooth headphones"
    }
  }
}

自动分词 → ["wireless", "bluetooth", "headphones"]
默认 OR 关系，任一词匹配即返回

`match_phrase` 查询

{
  "query": {
    "match_phrase": {
      "description": "wireless bluetooth"
    }
  }
}

要求短语完全匹配且顺序一致
常用于精确短语搜索

`multi_match` 查询

{
  "query": {
    "multi_match": {
      "query": "apple",
      "fields": ["name^3", "description"]  // name 字段权重 x3
    }
  }
}

跨多个字段搜索
支持字段提升（Boost）

2. 精确值查询（Term-Level Queries）

适用于 keyword、数字、日期等不分词字段。

`term` 查询

{
  "query": {
    "term": {
      "status.keyword": "active"
    }
  }
}

精确匹配，不进行分词
注意：text 字段需用 .keyword 子字段

`terms` 查询

{
  "query": {
    "terms": {
      "category.id": [101, 102, 103]
    }
  }
}

匹配多个精确值
类似 SQL 的 IN (...)

`range` 查询

{
  "query": {
    "range": {
      "price": {
        "gte": 100,
        "lte": 500
      },
      "created_at": {
        "gte": "2025-01-01",
        "lt": "now"
      }
    }
  }
}

数值/日期范围查询
支持 gt, gte, lt, lte

3. 复合查询（Compound Queries）

组合多个查询条件。

`bool` 查询（最常用！）

{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "bluetooth" } }
      ],
      "filter": [
        { "term": { "brand.keyword": "Sony" } },
        { "range": { "price": { "lte": 200 } } }
      ],
      "must_not": [
        { "term": { "condition.keyword": "used" } }
      ],
      "should": [
        { "term": { "color.keyword": "black" } },
        { "term": { "color.keyword": "white" } }
      ],
      "minimum_should_match": 1
    }
  }
}

must：必须匹配，影响 _score
filter：必须匹配，不影响评分（可缓存，性能高）
must_not：必须不匹配
should：可选匹配，提升相关性

✅ 最佳实践：非评分条件用 filter！

三、搜索结果解析

响应结构

{
  "took": 12,                          // 查询耗时 (ms)
  "timed_out": false,
  "_shards": {
    "total": 3, "successful": 3, "skipped": 0, "failed": 0
  },
  "hits": {
    "total": { "value": 45, "relation": "eq" },
    "max_score": 2.1,
    "hits": [
      {
        "_index": "products",
        "_id": "1",
        "_score": 2.1,
        "_source": { ... }             // 原始文档
      }
    ]
  }
}

关键字段

"took"：总耗时，可用于监控慢查询
"hits.total.value"：匹配文档总数（注意分页限制）
"_score"：相关性评分，基于 TF-IDF 和 BM25 算法

四、排序与分页

1. 排序（Sort）

"sort": [
  { "price": { "order": "asc" } },
  { "rating": { "order": "desc", "unmapped_type": "float" } },
  { "_score": { "order": "desc" } }   // 按相关性降序
]

支持多字段排序
数值/日期字段排序快（Doc Values）
文本字段排序需 .keyword

2. 分页方案对比

方案	语法	适用场景	缺点
`from + size`	`?from=10&size=10`	浅分页（前 100 页）	深分页性能差
`search_after`	`?search_after=[100,"doc_id"]`	深分页、实时滚动	需维护上下文
`scroll`	`?scroll=1m`	大数据导出	上下文占用内存

`search_after` 示例

{
  "size": 10,
  "query": { "match_all": {} },
  "sort": [ { "timestamp": "asc" }, { "_id": "asc" } ],
  "search_after": [ "2025-01-01T00:00:00", "product_5" ]
}

✅ 推荐：深分页使用 search_after。

五、高亮显示（Highlighting）

让匹配关键词在结果中突出显示。

{
  "query": {
    "match": { "description": "bluetooth" }
  },
  "highlight": {
    "fields": {
      "description": {
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"]
      }
    },
    "fragment_size": 150
  }
}

响应中会增加：

"highlight": {
  "description": [
    "Wireless <em>bluetooth</em> headphones with noise cancellation..."
  ]
}

六、聚合分析（Aggregations）

超越搜索，进入数据分析领域。

1. 桶聚合（Bucket Aggregations）

"aggs": {
  "by_brand": {
    "terms": { "field": "brand.keyword", "size": 10 },
    "aggs": {
      "avg_price": { "avg": { "field": "price" } }
    }
  }
}

按品牌分组，计算平均价格
类似 SQL 的 GROUP BY

2. 指标聚合（Metric Aggregations）

"aggs": {
  "stats_price": { "stats": { "field": "price" } },
  "percentiles_load": { "percentiles": { "field": "response_time" } }
}

统计信息：均值、最大值、百分位数等

3. 管道聚合（Pipeline Aggregations）

"aggs": {
  "sales_per_month": {
    "date_histogram": { "field": "date", "calendar_interval": "month" },
    "aggs": {
      "total": { "sum": { "field": "amount" } }
    }
  },
  "max_month": {
    "max_bucket": { "buckets_path": "sales_per_month>total" }
  }
}

对聚合结果再进行计算

七、性能优化与调优

1. 使用 `filter` 替代 `must`

// ❌ 影响评分，不可缓存
"must": [ { "term": { "status": "active" } } ]

// ✅ 不影响评分，可缓存，性能更高
"filter": [ { "term": { "status": "active" } } ]

2. 减少 `_source` 大小

"_source": ["title", "price", "image_url"]

只返回必要字段

3. 启用请求缓存

GET /products/_search?request_cache=true
{
  "aggs": { "price_stats": { "avg": { "field": "price" } } }
}

对不变的聚合结果缓存

4. 监控慢查询

PUT /_cluster/settings
{
  "transient": {
    "index.search.slowlog.threshold.query.warn": "5s",
    "index.search.slowlog.threshold.fetch.warn": "1s"
  }
}