note
TDigest
算法是当前 ES
计算 percentiles
的算法,特点如下
- 准确度
q(1-q)
, 也就是说 极端百分比的准确性高于非极端百分比数据,如 99% 的准确性高于 50%
- 对于小数据,其准备性很高。足够小时,可达到 100%
- 该算法有效权衡了消耗内存量和数据准确性。由于错误率受数据分布和数据量影响,难以给出准备的错误率级别
- percentiles median 都是利用 TDigest
- Metrics Aggregations 指标聚合,基于数据做计算
- Bucket Aggregations 桶聚合,是不涉及具体计算的聚合,只是判断 doc 是否满足条件。 默认最多
10000
请求,可通过 search.max_buckets
设置
avg && median
POST kibana_sample_data_logs/_search?size=0
{
"aggs": {
"avg_grade": {
"avg":{
"field":"bytes",
"missing":10
}
}
}
}
POST kibana_sample_data_logs/_search?size=0
{
"aggs": {
"avg_grade": {
"avg":{
"script":{
"source":"doc.bytes"
}
}
}
}
}
POST kibana_sample_data_logs/_search?size=0
{
"aggs": {
"avg_grade": {
"avg":{
"field":"bytes",
"script":{
"lang":"painless",
"source":"_value+params.correction",
"params":{
"correction":10
}
}
}
}
}
}
GET /kibana_sample_data_flights/_search?size=0
{
"aggs": {
"flight_time_avg": {
"avg": {
"field":"FlightTimeMin"
}
},
"flight_time_median":{
"median_absolute_deviation":{
"field":"FlightTimeMin",
"misFlightTimeMinsing":100
}
}
}
}
GET /kibana_sample_data_flights/_search?size=0
{
"aggs": {
"flight_time_median":{
"median_absolute_deviation":{
"script":{
"lang":"painless",
"source":"doc['FlightTimeMin'].value * params.factor",
"params":{
"factor":1.2
}
}
}
}
}
}
weighted_avg: ∑(value * weight) / ∑(weight)
POST /exams/_doc?refresh
{
"grade": [1, 2, 3],
"weight": 2
}
GET kibana_sample_data_ecommerce/_search
{
"query": {"match_all": {}}
}
# (1*2+2*2+3*2)/(2+2+2)=2
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade",
"missing":2
},
"weight": {
"field": "weight",
"missing":2
}
}
}
}
}
# (2*3+3*3+4*3)/(3+3+3)= 3, but the output is 2 ????
POST /exams/_search?explain=true
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"script": "doc.grade.value + 1"
},
"weight": {
"script": "doc.weight.value + 1"
}
}
}
}
}
cardinality 不同数据数
note
- 一种计算个数的方式:把 value 哈希至 HashSet,然后返回 size
- 基于 HyperLogLog++ algorithm,该算法特点如下:
- 可配置精度
- 不同数据量小的时候,精度很高
- 不管value数据量有多少,需要的内存量只跟配置的精度有关。配置精度
c
,计算过程需要内存 c*8 bytes
- string类别的cardinality结果比较大,可在index的时候利用 mapper-murmur3 插件计算hash值,利用
demo
POST /kibana_sample_data_flights/_search?size=0
{
"aggs": {
"host_count": {
"cardinality": {
"field": "OriginCityName",
"missing":"N/A"
}
}
}
}
POST /kibana_sample_data_flights/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"script": {
"lang": "painless",
"source": "doc['OriginCountry'].value + '-' + doc['OriginCityName'].value + ' ' + doc['DestCountry'].value + '-' + doc['DestCityName'].value"
}
}
}
}
}
POST _scripts/test_template
{
"script": {
"lang": "painless",
"source": "doc['{
{country1}}'].valu