Elasticsearch排序

原创于 2025-06-05 00:59:46 发布 · 663 阅读

16 ·

CC 4.0 BY-SA版权

文章标签：

#elasticsearch #jenkins #大数据

JAVA海量数据分布式开发专栏收录该内容

8 篇文章

订阅专栏

第1关：Elasticsearch排序

代码

第1关：Elasticsearch排序

任务描述
本关任务：查询父亲年龄大于 50 岁的用户信息，按照用户年龄进行升序排序。

相关知识
为了按照相关性来排序，需要将相关性表示为一个数值。在 Elasticsearch 中，相关性得分由一个浮点数进行表示，并在搜索结果中通过 _score 参数返回，默认排序是 _score 降序。

首先，我们先加载一下数据：

curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/4?pretty' -d '
{
"date" : "2014-09-14",
"name" : "John Smith",
"tweet" : "@mary it is not just text, it does everything",
"user_id" : 1
}
'
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/6?pretty' -d '
{
"date" : "2014-09-16",
"name" : "John Smith",
"tweet" : "The Elasticsearch API is really easy to use",
"user_id" : 1
}
'
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/8?pretty' -d '
{
"date" : "2014-09-18",
"name" : "John Smith",
"user_id" : 1
}
'
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/10?pretty' -d '
{
"date" : "2014-09-20",
"name" : "John Smith",
"tweet" : "Elasticsearch surely is one of the hottest new NoSQL products",
"user_id" : 1
}
'
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/12?pretty' -d '
{
"date" : "2014-09-22",
"name" : "John Smith",
"tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.",
"user_id" : 1
}
'
curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/us/tweet/14?pretty' -d '
{
"date" : "2014-09-24",
"name" : "John Smith",
"tweet" : "How many more cheesy tweets do I have to write?",
"user_id" : 1
}
'
有时，相关性评分对你来说并没有意义。例如，下面的查询返回所有 user_id 字段包含 1 的结果：

{
"query" : {
"bool" : {
"filter" : {
"term" : {
"user_id" : 1
}
}
}
}
}
这里没有一个有意义的分数：因为我们使用的是 filter （过滤），这表明我们只希望获取匹配 user_id: 1 的文档，并没有试图确定这些文档的相关性。实际上文档将按照随机顺序返回，并且每个文档都会评为零分。

按照字段的值排序
在这个案例中，通过时间来对 tweets 进行排序是有意义的，最新的 tweets 排在最前。我们可以使用 sort 参数进行实现：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"bool" : {
"filter" : { "term" : { "user_id" : 1 }}
}
},
"sort": { "date": { "order": "desc" }}
}
'
执行结果：

{
...
"hits" : {
"total" : 6,
"max_score" : null,
"hits" : [
{
"_id" : "14",
"_score" : null,
"_source" : {
"date" : "2014-09-24",
"name" : "John Smith",
"tweet" : "How many more cheesy tweets do I have to write?",
"user_id" : 1
},
"sort" : [
1411516800000
]
},
{
"_id" : "12",
"_score" : null,
"_source" : {
"date" : "2014-09-22",
"name" : "John Smith",
"tweet" : "Elasticsearch and I have left the honeymoon stage, and I still love her.",
"user_id" : 1
},
"sort" : [
1411344000000
]
},
{
"_id" : "10",
"_score" : null,
"_source" : {
"date" : "2014-09-20",
"name" : "John Smith",
"tweet" : "Elasticsearch surely is one of the hottest new NoSQL products",
"user_id" : 1
},
"sort" : [
1411171200000
]
}
...
]
}
}
从执行结果上我们可以看出：_score 不被计算, 因为它并没有用于排序。date 字段的值表示为自 epoch (January 1, 1970 00:00:00 UTC)以来的毫秒数，通过 sort 字段的值进行返回。

首先我们在每个结果中有一个新的名为 sort 的元素，它包含了我们用于排序的值。在这个案例中，我们按照 date 进行排序，在内部被索引为自 epoch 以来的毫秒数。 long 类型数 1411516800000 等价于日期字符串 2014-09-24 00:00:00 UTC。

其次 _score 和 max_score 字段都是 null。计算 _score 的花销巨大，通常仅用于排序；我们并不根据相关性排序，所以记录 _score 是没有意义的。如果无论如何你都要计算 _score，你可以将 track_scores 参数设置为 true。

多级排序
假定我们想要结合使用 date 和 _score 进行查询，并且匹配的结果首先按照日期排序，然后按照相关性排序：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query" : {
"bool" : {
"must": { "match": { "tweet": "manage text search" }},
"filter" : { "term" : { "user_id" : 2 }}
}
},
"sort": [
{ "date": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
}
'
排序条件的顺序是很重要的。结果首先按第一个条件排序，仅当结果集的第一个 sort 值完全相同时才会按照第二个条件进行排序，以此类推。

多级排序并不一定包含 _score。你可以根据一些不同的字段进行排序，如地理距离或是脚本计算的特定值。

多值字段的排序
一种情形是字段有多个值的排序，需要记住这些值并没有固有的顺序；一个多值的字段仅仅是多个值的包装，这时应该选择哪个进行排序呢？

对于数字或日期，你可以将多值字段减为单值，这可以通过使用 min 、 max 、 avg 或是 sum 排序模式。例如你可以按照每个 date 字段中的最早日期进行排序，通过以下方法：

"sort": {
"dates": {
"order": "asc",
"mode": "min"
}
}
编程要求
根据提示，在右侧编辑器 Begin-End 处补充代码，按照如下要求，查询父亲年龄大于 50 岁的用户信息，按照用户年龄进行升序排序，具体要求如下：

用户类型 parent，其归属于索引 user；

查询父亲年龄（shgx.age）大于 50 岁的用户信息；

最后，按照用户年龄（age）进行升序排序；

数据信息如下：

{
"id":1,
"name":"张三",
"age":18,
"shgx":{
"id":1,
"name":"老张",
"age":50,
"gx":"父亲"
}
}
注意：点击评测前，首先进入命令行执行如下命令启动 Elasticsearch：

su es
/opt/install/elasticsearch-6.5.4/bin/elasticsearch
启动成功后再点击评测即可。

测试说明
平台会对你编写的代码进行测试：

测试输入：无
预期输出：

{
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"hits" : [
{
"_index" : "user",
"_type" : "parent",
"_id" : "1",
"_source" : {
"id" : 1,
"name" : "张三",
"age" : 18,
"shgx" : {
"id" : 1,
"name" : "老张",
"age" : 52,
"gx" : "父亲"
}
},
"sort" : [
18
]
},
{
"_index" : "user",
"_type" : "parent",
"_id" : "3",
"_source" : {
"id" : 3,
"name" : "王五",
"age" : 19,
"shgx" : {
"id" : 3,
"name" : "老王",
"age" : 51,
"gx" : "父亲"
}
},
"sort" : [
19
]
},
{
"_index" : "user",
"_type" : "parent",
"_id" : "2",
"_source" : {
"id" : 2,
"name" : "李四",
"age" : 25,
"shgx" : {
"id" : 2,
"name" : "老李",
"age" : 60,
"gx" : "父亲"
}
},
"sort" : [
25
]
}
]
}
}

代码

#!/bin/bash
# 请在此处编写命令
# ********** Begin ********** #
curl -X GET "localhost:9200/user/parent/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query" : {
        "bool" : {
            "filter" : { 
                "range" : {
                    "shgx.age" : {
                        "gt":50
                    }
                }
            }
        }
    },
    "sort": { "age": { "order": "asc" }}
}
'
# ********** End ********** #