对ES官网的reference的翻译,同时也是备忘,ES版本为7.5
下面是正文翻译,附上原文链接
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-aggregations.html
==================================================================================================
使用聚合分析结果
ES聚合能够让你获取关于搜索结果的元信息并能让你回答类似于"有多少Texas的开户者"或者“Tennessee的账户的平均余额是多少”的问题,你能够在同一个请求中搜索文档、过滤返回值并使用聚合来分析结果。
例如,下面的请求使用terms聚合对bank索引中的账户基于州分组并以降序返回账户最多的十个州:
curl http://9.25.176.228:8080/bank/_search?pretty
-H 'content-type:application/json'
-d '{
"size":0,
"aggs":{
"group_by_state":{
"terms":{
"field":"state.keyword"
}
}
}
}'
响应的buckets部分是state字段的值,doc_count显示的是每个州的账户数目,比如,你可以看到ID这个州有27个账户。由于我们设置了请求体中size=0,因此响应只会包含聚合的结果。
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30
},
{
"key" : "MD",
"doc_count" : 28
},
{
"key" : "ID",
"doc_count" : 27
},
{
"key" : "AL",
"doc_count" : 25
},
{
"key" : "ME",
"doc_count" : 25
},
{
"key" : "TN",
"doc_count" : 25
},
{
"key" : "WY",
"doc_count" : 25
},
{
"key" : "DC",
"doc_count" : 24
},
{
"key" : "MA",
"doc_count" : 24
},
{
"key" : "ND",
"doc_count" : 24
}
]
}
}
}
自己试了一下把请求体中的size=0去掉之后,除了返回上面聚合的结果还会返回前十个文档(比较发现这十个文档就是导入时在最前面的十个文档):
{
"took" : 69,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "13",
"_score" : 1.0,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 1.0,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "daleadams@boink.com",
"city" : "Orick",
"state" : "MD"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.0,
"_source" : {
"account_number" : 25,
"balance" : 40540,
"firstname" : "Virginia",
"lastname" : "Ayala",
"age" : 39,
"gender" : "F",
"address" : "171 Putnam Avenue",
"employer" : "Filodyne",
"email" : "virginiaayala@filodyne.com",
"city" : "Nicholson",
"state" : "PA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "32",
"_score" : 1.0,
"_source" : {
"account_number" : 32,
"balance" : 48086,
"firstname" : "Dillard",
"lastname" : "Mcpherson",
"age" : 34,
"gender" : "F",
"address" : "702 Quentin Street",
"employer" : "Quailcom",
"email" : "dillardmcpherson@quailcom.com",
"city" : "Veguita",
"state" : "IN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "37",
"_score" : 1.0,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "mcgeemooney@reversus.com",
"city" : "Tooleville",
"state" : "OK"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "44",
"_score" : 1.0,
"_source" : {
"account_number" : 44,
"balance" : 34487,
"firstname" : "Aurelia",
"lastname" : "Harding",
"age" : 37,
"gender" : "M",
"address" : "502 Baycliff Terrace",
"employer" : "Orbalix",
"email" : "aureliaharding@orbalix.com",
"city" : "Yardville",
"state" : "DE"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "49",
"_score" : 1.0,
"_source" : {
"account_number" : 49,
"balance" : 29104,
"firstname" : "Fulton",
"lastname" : "Holt",
"age" : 23,
"gender" : "F",
"address" : "451 Humboldt Street",
"employer" : "Anocha",
"email" : "fultonholt@anocha.com",
"city" : "Sunriver",
"state" : "RI"
}
}
]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30
},
{
"key" : "MD",
"doc_count" : 28
},
{
"key" : "ID",
"doc_count" : 27
},
{
"key" : "AL",
"doc_count" : 25
},
{
"key" : "ME",
"doc_count" : 25
},
{
"key" : "TN",
"doc_count" : 25
},
{
"key" : "WY",
"doc_count" : 25
},
{
"key" : "DC",
"doc_count" : 24
},
{
"key" : "MA",
"doc_count" : 24
},
{
"key" : "ND",
"doc_count" : 24
}
]
}
}
}
你也可以通过组合聚合来为你的数据构造更复杂的摘要。例如,下面的请求在之前的group_by_state聚合内部嵌套了一个avg聚合来计算每个州的平均账户余额:
curl http://9.25.176.228:8080/bank/_search?pretty
-H 'content-type:application/json'
-d '{
"aggs":{
"group_by_state":{
"terms":{
"field":"state.keyword"
},
"aggs":{
"average_balance":{
"avg":{
"field":"balance"
}
}
}
}
}
}'
{
"took" : 11,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "amberduke@pyrami.com",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "hattiebond@netagy.com",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "13",
"_score" : 1.0,
"_source" : {
"account_number" : 13,
"balance" : 32838,
"firstname" : "Nanette",
"lastname" : "Bates",
"age" : 28,
"gender" : "F",
"address" : "789 Madison Street",
"employer" : "Quility",
"email" : "nanettebates@quility.com",
"city" : "Nogal",
"state" : "VA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 1.0,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "daleadams@boink.com",
"city" : "Orick",
"state" : "MD"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "elinorratliff@scentric.com",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.0,
"_source" : {
"account_number" : 25,
"balance" : 40540,
"firstname" : "Virginia",
"lastname" : "Ayala",
"age" : 39,
"gender" : "F",
"address" : "171 Putnam Avenue",
"employer" : "Filodyne",
"email" : "virginiaayala@filodyne.com",
"city" : "Nicholson",
"state" : "PA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "32",
"_score" : 1.0,
"_source" : {
"account_number" : 32,
"balance" : 48086,
"firstname" : "Dillard",
"lastname" : "Mcpherson",
"age" : 34,
"gender" : "F",
"address" : "702 Quentin Street",
"employer" : "Quailcom",
"email" : "dillardmcpherson@quailcom.com",
"city" : "Veguita",
"state" : "IN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "37",
"_score" : 1.0,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "mcgeemooney@reversus.com",
"city" : "Tooleville",
"state" : "OK"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "44",
"_score" : 1.0,
"_source" : {
"account_number" : 44,
"balance" : 34487,
"firstname" : "Aurelia",
"lastname" : "Harding",
"age" : 37,
"gender" : "M",
"address" : "502 Baycliff Terrace",
"employer" : "Orbalix",
"email" : "aureliaharding@orbalix.com",
"city" : "Yardville",
"state" : "DE"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "49",
"_score" : 1.0,
"_source" : {
"account_number" : 49,
"balance" : 29104,
"firstname" : "Fulton",
"lastname" : "Holt",
"age" : 23,
"gender" : "F",
"address" : "451 Humboldt Street",
"employer" : "Anocha",
"email" : "fultonholt@anocha.com",
"city" : "Sunriver",
"state" : "RI"
}
}
]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 743,
"buckets" : [
{
"key" : "TX",
"doc_count" : 30,
"average_balance" : {
"value" : 26073.3
}
},
{
"key" : "MD",
"doc_count" : 28,
"average_balance" : {
"value" : 26161.535714285714
}
},
{
"key" : "ID",
"doc_count" : 27,
"average_balance" : {
"value" : 24368.777777777777
}
},
{
"key" : "AL",
"doc_count" : 25,
"average_balance" : {
"value" : 25739.56
}
},
{
"key" : "ME",
"doc_count" : 25,
"average_balance" : {
"value" : 21663.0
}
},
{
"key" : "TN",
"doc_count" : 25,
"average_balance" : {
"value" : 28365.4
}
},
{
"key" : "WY",
"doc_count" : 25,
"average_balance" : {
"value" : 21731.52
}
},
{
"key" : "DC",
"doc_count" : 24,
"average_balance" : {
"value" : 23180.583333333332
}
},
{
"key" : "MA",
"doc_count" : 24,
"average_balance" : {
"value" : 29600.333333333332
}
},
{
"key" : "ND",
"doc_count" : 24,
"average_balance" : {
"value" : 26577.333333333332
}
}
]
}
}
}
从上面的响应可以发现,返回的结果都是按照doc_count的值对返回的结果进行排序的,当然你也可以不这么做,你可以通过在terms聚合中指定使用嵌套聚合中的结果来排序:
curl http://9.25.176.228:8080/bank/_search?pretty
-H 'content-type:application/json'
-d '{
"size":0,
"aggs":{
"group_by_state":{
"terms":{
"field":"state.keyword",
"order":{
"average_balance":"desc"
}
},
"aggs":{
"average_balance":{
"avg":{
"field":"balance"
}
}
}
}
}
}'
观察下面的响应,发现返回结果确实是按照余额的平均值排序的。
{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 827,
"buckets" : [
{
"key" : "CO",
"doc_count" : 14,
"average_balance" : {
"value" : 32460.35714285714
}
},
{
"key" : "NE",
"doc_count" : 16,
"average_balance" : {
"value" : 32041.5625
}
},
{
"key" : "AZ",
"doc_count" : 14,
"average_balance" : {
"value" : 31634.785714285714
}
},
{
"key" : "MT",
"doc_count" : 17,
"average_balance" : {
"value" : 31147.41176470588
}
},
{
"key" : "VA",
"doc_count" : 16,
"average_balance" : {
"value" : 30600.0625
}
},
{
"key" : "GA",
"doc_count" : 19,
"average_balance" : {
"value" : 30089.0
}
},
{
"key" : "MA",
"doc_count" : 24,
"average_balance" : {
"value" : 29600.333333333332
}
},
{
"key" : "IL",
"doc_count" : 22,
"average_balance" : {
"value" : 29489.727272727272
}
},
{
"key" : "NM",
"doc_count" : 14,
"average_balance" : {
"value" : 28792.64285714286
}
},
{
"key" : "LA",
"doc_count" : 17,
"average_balance" : {
"value" : 28791.823529411766
}
}
]
}
}
}
除了上述的基本的分桶(bucketing)和指标(metrics)聚合,ES还提供来特殊的聚合以便对多个域进行操作并分析特定类型的数据,比如日期、ip地址以及地理数据。你也可以将单个聚合的结果输入到流水线级的聚合中做进一步分析。
由聚合提供的核心分析能力使得利用机器学习检测异常这种先进的特性成为可能(也就是说,聚合提供的分析能力是机器学习的基础)。