聚合查询
在使用关系型数据库的时候,常常会用到group by等操作进行分组求和、求平均的操作。而Elasticsearch同样提供了分组计算的能力。
关系型数据库中使用Group by进行分组计算在Elasticsearch中成为桶聚合。
数据准备
使用聚合查询之前几篇的数据可能不太适合,现在重新模拟一些数据。
新建索引
PUT localhost:9200/employee
新建映射
PUT localhost:9200/employee/_mapping
{
"properties": {
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"group": {
"type": "keyword"
},
"entry_time": {
"type": "date"
}
}
}
新建数据
public static String INDEX = "employee";
public static String[] GROUP = new String[] {
"甲","乙","丙","丁"};
public static int[] AGE = new int[] {
15,20,25,30,35,40,45,50};
public static String[] TIME =
new String[] {
"2019-01-01","2019-01-15","2019-02-01","2019-02-15",
"2019-03-01","2019-03-15","2019-04-01","2019-04-15",
"2019-05-01","2019-05-15","2019-06-01","2019-06-15",
"2019-07-01","2019-07-15","2019-08-01","2019-08-15",
"2019-09-01","2019-09-15","2019-10-01","2019-10-15",
"2019-11-01","2019-11-15","2019-12-01","2019-12-15"};
public static void createData() throws IOException {
BulkRequest request = new BulkRequest();
for (int i = 0; i < 100; i++) {
request.add(
new IndexRequest(INDEX)
.id(String.valueOf(i+1))
.source(getBuilder(i))
);//使用SMILE格式添加索引请求
}
getClient().bulk(request, RequestOptions.DEFAULT);
}
public static XContentBuilder getBuilder(int i) throws IOException {
XContentBuilder builder = XContentFactory.jsonBuilder();
builder.startObject();
builder.field("name", GROUP[i%4] + i);
builder.field("age", AGE[i%8]);
builder.field("group", GROUP[i%4]);
builder.field("entry_time", TIME[i%24]);
builder.endObject();
return builder;
}
聚合查询
求平均值
现在模拟求所属甲
分组age
字段的平均值。可以使用下面的命令。聚合查询中这些常用的计算和SQL中是类似的avg是求平均
、sum求和
、max求最
大、min求最小
。这里我只贴出了avg
的代码
请求
POST localhost:9200/employee/_search
参数
Elasticsearch对于数据的聚合分析,都是使用aggs
来完成每个桶内的数据的分组其中aggQuery
是每次计算的一个别名,用来取出计算结果的时候使用。
{
"query": {
"term": {
"group": {
"value": "甲"
}
}
},
"aggs": {
"aggQuery": {
"avg": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 25,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"aggQuery": {
"value": 24.6
}
}
}
java代码
Elasticsearch对于数据的分组,都是使用
// 平均
public void avg() throws IOException {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("group","甲"));
AvgAggregationBuilder agg = AggregationBuilders
.avg("aggQuery")
.field("age");
sourceBuilder.aggregation(agg);
SearchRequest request = new SearchRequest(INDEX);
request.source(sourceBuilder);
SearchResponse search =
getClient().search(request, RequestOptions.DEFAULT);
if (search.getShardFailures().length == 0) {
ParsedAvg parsed = search.getAggregations().get("aggQuery");
double value = parsed.getValue();
System.out.println("rest:" + value);
System.out.println("do something");
}
}
求总
value_count
的操作可以获取在该字段上非空的数据的条目数。
请求
POST localhost:9200/employee/_search
参数
{
"query": {
"term": {
"group": {
"value": "甲"
}
}
},
"aggs": {
"aggQuery": {
"value_count": {
"field": "age"
}
}
},
"size": 0
}
响应结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 25,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"aggQuery": {
"value": 25
}
}
}
java代码
// 统计
public void count() throws IOException {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("group","甲"));
ValueCountAggregationBuilder agg = AggregationBuilders
.count("aggQuery")
.field("age");
sourceBuilder.aggregation(agg);
SearchRequest request = new SearchRequest(INDEX);
request.source(sourceBuilder);
SearchResponse search =
getClient().search(request, RequestOptions.DEFAULT);
if (search.getShardFailures().length == 0) {
ParsedValueCount parsed = search.getAggregations().get("aggQuery");
double value = parsed.getValue();
System.out.println("rest:" + value);
System.out.println("do something");
}
}
去重
其用法类似SQL中的 distinct 或者unique 值的数目。是有种获取数据去重后的结果的方法。
下面的内容尝试求出现在模拟求所属甲
分组age
字段后去重的内容,aggQuery
是每次计算的一个别名,用来取出计算结果的时候使用。下面的内容在忽略query条件后可能类似这个样子:
SELECT COUNT (DISTINCT age) FROM employee
请求
POST localhost:9200/employee/_search
参数
{
"query": {
"term": {
"group": {
"value": "甲"
}
}
},
"aggs": {
"aggQuery": {
"cardinality": {
"field": "age"
}
}
},
"size": 0
}
响应结果
在hits
中可以看到其命中的文档数量是25.但是在aggregations
的aggQuery
去重后只有2个数据。
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,<