Elasticsearch中flattened字段类型

最新推荐文章于 2025-05-16 17:08:17 发布

原创最新推荐文章于 2025-05-16 17:08:17 发布 · 1.4k 阅读

0 ·

CC 4.0 BY-SA版权

ElasticSearch 专栏收录该内容

47 篇文章

订阅专栏

本文探讨Elasticsearch中优化索引性能的方法，重点介绍了如何统计索引字段数量及解决字段数量超限的问题。提供了字段能力API的具体使用示例，并推荐了使用flattened数据类型来避免映射字段过多导致的性能下降。

为了优化索引性能，需要统计索引的字段数量。
Elasticsearch默认索引字段不能超过1000，由index.mapping.total_fields.limit参数进行设置。字段和对象映射，以及字段别名都计入这个限制。该值越大会导致内存不足和性能下降，特别在高负载的集群环境。

实际应用mapping定义通常采用动态模板进行定义，那么如何统计索引中字段数量，以及如何解决字段数超过限制问题。

统计索引中字段数量

ES没有提供_stat统计字段API，下面介绍字段能力API,可以返回多个索引的字段信息：

GET /_field_caps?fields=<fields>

POST /_field_caps?fields=<fields>

GET /<target>/_field_caps?fields=<fields>

POST /<target>/_field_caps?fields=<fields>

下面示例查看字段rating和title信息:

GET _field_caps?fields=rating,title

响应：

{
  "indices": [ "index1", "index2", "index3", "index4", "index5" ],
  "fields": {
    "rating": {                                   
      "long": {
        "searchable": true,
        "aggregatable": false,
        "indices": [ "index1", "index2" ],
        "non_aggregatable_indices": [ "index1" ]  
      },
      "keyword": {
        "searchable": false,
        "aggregatable": true,
        "indices": [ "index3", "index4" ],
        "non_searchable_indices": [ "index4" ]    
      }
    },
    "title": {                                    
      "text": {
        "searchable": true,
        "aggregatable": false

      }
    }
  }
}

共5个索引，返回每个字段所属那些索引及其类型。如title在所有索引中都存在。

查询某个索引的字段信息,同时包括未映射的字段：

GET /index_name/_field_caps?fields=*&include_unmapped

超过字段数量限制

自然的想法是能不能调大index.mapping.total_fields.limit参数，如果要调大，最好同时调整indices.query.bool.max_clause_count参数，即查找中布尔查询子句的最大数量。

使用flattened数据类型

对于对象中的子字段默认被单独映射、索引。扁平字段类型是解决字段过多问题的另一种方法，它把整个对象映射未单个字段。给定对象，flattened 类型解析出每个子字段值作为keyword类型，对象中的内容可以通过单个查询进行搜索或聚集。对于唯一字段类型未知的情况非常有用，对于真个json对象仅创建一个映射字段，可以有效防止映射有太多的字段————映射爆炸。

另外扁平对象字段在搜索功能方面提供了一种折衷方案。只允许基本查询，不支持数字范围查询或高亮显示。

示例：

PUT bug_reports
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "labels": {
        "type": "flattened"
      }
    }
  }
}

POST bug_reports/_doc/1
{
  "title": "Results are not sorted correctly.",
  "labels": {
    "priority": "urgent",
    "release": ["v1.2.5", "v1.3.0"],
    "timestamp": {
      "created": 1541458026,
      "closed": 1541457010
    }
  }
}

POST bug_reports/_doc/2
{
  "title": "Results are not descripted correctly.",
  "labels": {
    "priority": "urgent",
    "release": ["v1.3.5", "v1.4.0"],
    "creator": "tester",
    "timestamp": {
      "created": 1541458026,
      "closed": 1541457010
    }
  }
}

这里定义了flattened类型labels字段，实际插入数据未对象，可以包括多个属性。