ES的一些常用命令

IT小鸟鸟

已于 2022-02-09 15:30:43 修改

阅读量9k

点赞数

分类专栏： elasticSearch 文章标签： elasticsearch sql 大数据

于 2022-01-27 15:53:19 首次发布

本文链接：https://blog.youkuaiyun.com/u013111855/article/details/122719018

版权

elasticSearch 专栏收录该内容

2 篇文章

订阅专栏

本文详细介绍Elasticsearch的基本操作、数据检索及聚合分析等高级功能。包括增删改查、查询优化、排序分页、条件筛选、高亮显示及聚合统计等方面的知识。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

ES的常用命令

文章的命令都是基于kibana模式下的命令，目前尝试所有命令都是可以执行成功的。

kibana模式下增删改查

PUT 类似于SQL中的增

DELETE 类似于SQL中的删

POST 类似于SQL中的改

GET 类似于SQL中的查

基本命令

占位行…

查看集群健康状况

GET _cat/health

查询ES中所有的index

GET /_cat/indices?v

GET _all

删除名称为eg_index的索引

DELETE /eg_index

ES的一些设置

设置es最大返回记录数（size）

PUT /ecommerce/_settings
{
    "index": {
        "max_result_window": "50000000"
    }
}

查看索引的mapping

GET ecommerce/_mapping

ES的CURD操作

插入数据

使用 PUT /index/type/id

PUT /ecommerce/product/1
{
  "name":"zhangsan",
  "customer_full_name":{"firstname":"zhang","lastname":"san"},
  "gender":"man"
}

PUT /ecommerce/product/2
{
  "name":"隔壁老王",
  "customer_full_name":{"firstname":"wang","lastname":"wu"},
  "gender":"man"
}

# 也可以使用 POST
POST /ecommerce/product/1
{
  "name":"张三",
  "gender":"男"
}


# 如果使用 POST+update 方式，则只会更改对应的字段，其它字段不变，是局部更新；否则使用put或post方式将导致其它数据变化，数据全局更新。
POST /ecommerce/product/1/_update
{
  "doc":{
    "name":"张三"
  }
}

注意：我们插入数据的时候，如果我们的语句中指明了index和type，如果ES里面不存在，默认帮我们自动创建

删除数据

DELETE /ecommerce/product/1

# 执行结果
{
  "_index" : "ecommerce",
  "_type" : "product",
  "_id" : "1",
  "_version" : 11,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 11,
  "_primary_term" : 2
}

# 发现version不是1，这就说明跟hbase是类似的，不会立刻删除，会在合适的时机进行删除。

查看所有数据

GET /ecommerce/product/_search

DSL语言

ES最主要是用来做搜索和分析的。所以DSL还是对于ES很重要的。

下面我们写的代码都是RESTful风格。

query DSL: domain Specialed Lanaguage 在特定领域的语言

执行查询之前，我们先插入一些数据：

POST /ecommerce/product/13
{
  "base_price" : 24.99,
  "discount_percentage" : 0,
  "quantity" : 1,
  "manufacturer" : "Champion Arts",
  "tax_amount" : 0,
  "product_id" : 11238,
  "category" : "Women's Clothing",
  "sku" : "ZO0489604896",
  "taxless_price" : 24.99,
  "unit_discount_amount" : 0,
  "min_price" : 11.75,
  "discount_amount" : 0,
  "created_on" : "2016-12-25T21:59:02+00:00",
  "product_name" : "Denim dress - black denim",
  "price" : 24.99,
  "taxful_price" : 24.99,
  "base_unit_price" : 24.99
}


PUT /ecommerce/product/14
{
  "base_price" : 11.99,
  "discount_percentage" : 0,
  "quantity" : 1,
  "manufacturer" : "Elitelligence",
  "tax_amount" : 0,
  "product_id" : 6283,
  "category" : "Men's Clothing",
  "sku" : "ZO0549605496",
  "taxless_price" : 11.99,
  "unit_discount_amount" : 0,
  "min_price" : 6.35,
  "discount_amount" : 0,
  "created_on" : "2016-12-26T09:28:48+00:00",
  "product_name" : "Basic T-shirt - dark blue/white",
  "price" : 11.99,
  "taxful_price" : 11.99,
  "base_unit_price" : 11.99
}



PUT /ecommerce/product/17
{
  "base_price" : 24.99,
  "discount_percentage" : 0,
  "quantity" : 1,
  "manufacturer" : "Champion Arts",
  "tax_amount" : 0,
  "product_id" : 11238,
  "category" : "Women's Clothing",
  "sku" : "ZO0489604896",
  "taxless_price" : 24.99,
  "unit_discount_amount" : 0,
  "min_price" : 11.75,
  "discount_amount" : 0,
  "created_on" : "2016-12-25T21:59:02+00:00",
  "product_name" : "Denim dress - black denim",
  "price" : 24.99,
  "taxful_price" : 24.99,
  "base_unit_price" : 24.99
}



PUT /ecommerce/product/15
{
  "base_price" : 99.99,
  "discount_percentage" : 0,
  "quantity" : 1,
  "manufacturer" : "Low Tide Media",
  "tax_amount" : 0,
  "product_id" : 22794,
  "category" : "Women's Shoes",
  "sku" : "ZO0374603746",
  "taxless_price" : 99.99,
  "unit_discount_amount" : 0,
  "min_price" : 46.01,
  "discount_amount" : 0,
  "created_on" : "2016-12-25T22:32:10+00:00",
  "product_name" : "Boots - Midnight Blue",
  "price" : 99.99,
  "taxful_price" : 99.99,
  "base_unit_price" : 99.99
}

PUT /ecommerce/product/16
{
  "base_price" : 74.99,
  "discount_percentage" : 0,
  "quantity" : 1,
  "manufacturer" : "Primemaster",
  "tax_amount" : 0,
  "product_id" : 12304,
  "category" : "Women's Shoes",
  "sku" : "ZO0360303603",
  "taxless_price" : 74.99,
  "unit_discount_amount" : 0,
  "min_price" : 34.5,
  "discount_amount" : 0,
  "created_on" : "2016-12-25T22:58:05+00:00",
  "product_name" : "High heeled sandals - argento",
  "price" : 74.99,
  "taxful_price" : 74.99,
  "base_unit_price" : 74.99 
}

match_all

使用match_all 可以查询到所有文档，是没有查询条件下的默认语句。

GET /ecommerce/product/_search
{
  "query":{
    "match_all": {
      
    }
  }
}

match 匹配（全文检索）

match查询是一个标准查询，不管你需要全文本查询还是精确查询基本上都要用到它。

GET /ecommerce/product/_search
{
  "query":{
    "match": {
      "category" : "Clothing"
    }
  }
}

match_phrase 精确匹配

GET /ecommerce/product/_search
{
  "query":{
    "match_phrase": {
      "category":"Women's Clothing"
    }
  }
}

match匹配时，如果检索字段是多个单词，则检索逻辑是将单词拆分（分词），然后独立检索，之后取并集。 eg：

"match": {
   "category":"Women's Clothing"
}

# 先分词，按照 Women's 和 Clothing 两个单词进行检索，取并集

sort 排序

我们按照价格进行排序：因为不属于查询的范围了，所以要写一个逗号。

GET /ecommerce/product/_search
{
  "query":{
    "match": {
      "category" : "Clothing"
    }
  },
  "sort":[
    {
      "price":{
        "order":"desc"
      }
    }
  ]
}

#解析
"order":"desc" #按价格倒序排序

size 分页

GET /ecommerce/product/_search
{
  "query":{
    "match": {
      "category" : "Clothing"
    }
  },
  "sort":[
    {
      "price":{
        "order":"desc"
      }
    }
  ],
  "from":0, 
  "size":2
}

#解析
"from":0 #从第几个数据开始
"size":2 #每页多少数据

_source 返回指定字段

很多时候，我们不需要全部数据，部分字段数据足矣

GET /ecommerce/product/_search
{
  "query":{
    "match_all": {
      
    }
  },
  "sort":[
    {
      "price":{
        "order":"desc"
      }
    }
  ],
  "_source":["price","min_price","base_price"]
}

#解析
"_source":["price","min_price","base_price"]：#只显示价格相关数据

条件查询

eg：搜索名称里面包含Women’s Clothing，并且价格大于20元且小于50元的商品

相当于：相当于 select * form product where name like %Women’s Clothing% and price >50;

因为有两个查询条件，我们就需要使用下面的查询方式

如果需要多个查询条件拼接在一起就需要使用bool

bool 过滤可以用来合并多个过滤条件查询结果的布尔逻辑，它包含以下操作符：

must :: 多个查询条件的完全匹配,相当于 and。

must_not :: 多个查询条件的相反匹配，相当于 not。

should :: 至少有一个查询条件匹配, 相当于 or。

GET /ecommerce/product/_search
{
  "query":{
    "bool": {
      "must": [
        {
          "match": {
            "category": "Women's Shoes"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "price": {
              "gte": 20,
              "lte": 50
            }
          }
        }
      ]
    }
  }
}

#先查询（条件must），再过滤

结果进行高亮展示

#高亮展示
GET /ecommerce/product/_search
{
  "query":{
    "match_phrase": {
      "category":"Women's Clothing"
    }
  },
  "highlight":{
    "fields": {
      "category": {}
    }
  }
}

"hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "3",
        "_score" : 0.73748296,
        "_source" : {
          "base_price" : 24.99,
          "discount_percentage" : 0,
          "quantity" : 1,
          "manufacturer" : "Champion Arts",
          "tax_amount" : 0,
          "product_id" : 11238,
          "category" : "Women's Clothing",
          "sku" : "ZO0489604896",
          "taxless_price" : 24.99,
          "unit_discount_amount" : 0,
          "min_price" : 11.75,
          "discount_amount" : 0,
          "created_on" : "2016-12-25T21:59:02+00:00",
          "product_name" : "Denim dress - black denim",
          "price" : 24.99,
          "taxful_price" : 24.99,
          "base_unit_price" : 24.99
        },
        "highlight" : {
          "category" : [
            "<em>Women's</em> <em>Clothing</em>"
          ]
        }
      }
	]

#输出结果带：
"highlight" : {
    "category" : [
    "<em>Women's</em> <em>Clothing</em>"
    ]
}
<em>Women's</em> <em>Clothing</em> 这个标签是默认的标签，是可以自定义的进行替换的，比如我们可以替换成
<span style="color:red">Women's Clothing</span>，把这个输出到网页上，自然而然就是红色的了。

#注：高亮展示好像只能展示match里的关键词，比如此处只能高亮 Women's Clothing

Women’s Clothing

聚合分析

查询每种衣服种类的数量

GET /ecommerce/_search
{
  "aggs":{
    "group_by_category":{
      "terms": {
        "field": "category.keyword"
      }
    }
  }
}

# 按照 category 进行分组统计（统计每种category下的商品数）
group_by_category：本次查询的名称，自己随便取
field：一定要加上 `.keyword`，因为 category是text字段，默认没有索引，而text分词之后的keyword是有索引的，因此可以用 `category.keyword` 进行聚合。

# 聚合结果会在查询结果的底部显示
  "aggregations" : {
    "group_by_category" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Women's Clothing",
          "doc_count" : 11
        },
        {
          "key" : "Women's Shoes",
          "doc_count" : 4
        },
        {
          "key" : "Men's Clothing",
          "doc_count" : 1
        },
        {
          "key" : "Men's Shoes",
          "doc_count" : 1
        }
      ]
    }
  }
# 这个结果一搬是符合我们的业务预期的



#如果要想直接对 category 进行聚合，也可以将 "fielddata" 设置为 true （一般不推荐，因为设置之后，category如果又几个单词组成，也会被分词）
#设置方法为：
PUT /ecommerce/_mapping
{
  "properties":{
    "category" :{
      "type" : "text",
      "fielddata":true
    }
  }
}
#设置之后就可以进行分组聚合了
GET /ecommerce/product/_search
{
  "aggs":{
    "group_by_category":{
      "terms": {
        "field": "category"
      }
    }
  }
}

#结果为：
  "aggregations" : {
    "group_by_category" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "women's",
          "doc_count" : 15
        },
        {
          "key" : "clothing",
          "doc_count" : 12
        },
        {
          "key" : "shoes",
          "doc_count" : 5
        },
        {
          "key" : "men's",
          "doc_count" : 2
        }
      ]
    }
  }
# 很明显，不符合我们一般的业务需求了

查询每种衣服种类的数量，并计算其平均价格

GET /ecommerce/_search
{
  "aggs":{
    "group_by_category":{
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

range

range过滤允许我们按照指定范围查找一批数据

eg：查询出ecommerce里面包含Women’s Clothing的数据，按照指定的价格区间进行分组，在每个组内再按category进行分组，分完组以后再求每个组的平均价格，并且按照降序进行排序。

GET /ecommerce/product/_search
{
  "query":{
    "match": {
      "category": "Women's Clothing"
    }
  },
  "aggs":{
    "range_in_price":{
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 30
          },
          {
            "from": 30,
            "to": 50
          },
          {
            "from": 50,
            "to": 100
          }
        ]
      },
      "aggs": {
        "group_in_category": {
          "terms": {
            "field": "category.keyword",
            "order": {
              "avg_price":"desc"
            }
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

ES搜索结果解析

GET /ecommerce/product/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhangsan",
          "customer_full_name" : {
            "firstname" : "zhang",
            "lastname" : "san"
          },
          "gender" : "man"
        }
      }
    ]
  }
}

took 第2行，took表示Elasticsearch执行搜索所用的时间，单位是毫秒。
timed_out 第3行，timed_out 用来指示搜索是否超时。
_shards 第4行，_shards 指示搜索了多少分片，以及搜索成功和失败的分片的计数。
hits 第10行，hits 用来实际搜索结果集。
hits.total 第11行，hits.total 是包含与搜索条件匹配的文档总数信息的对象
hits.total.value 第12行，hits.total.value 表示总命中计数的值（必须在hits.total.relation上下文中解释）。
hits.total.relation 第13行，确切来说默认情况下，hits.total.value是不确切的命中计数，在这种情况下，当hits.total.relation的值是eq时，hits.total.value的值是准确计数。当hits.total.relation的值是gte时，hits.total.value的值是不准确的。
hits.hits 第16行，hits.hits 是存储搜索结果的实际数组（默认为前10个文档）。
hits.sort 表示结果排序键（如果请求中没有指定，则默认按分数排序）。
hits.total 解析

如果我们在请求的参数中加入 track_total_hits，并设置为true，那么我们可以看到在返回的参数中，它正确地显示了所有满足条件的文档个数。

# 请求头：
body = {
    "track_total_hits": "true",
    "query": {}
    }

# 返回结果
"total" : {
  "value" : 1,
  "relation" : "eq"
},

如何设置 `track_total_hits`

那么track_total_hits这个参数到底如何设置才是最合理的呢？这要结合具体的业务需求和应用场景。可以遵循如下三个原则：

保持默认值：10000，不变，这足以满足一般的业务需求，就算是淘宝、京东这样的大型电商网站，一页展示40个结果，10000个结果可以展示250页，相信没有用户会看250页后的商品，大多数情况下用户基本上都是浏览前10也的商品。
如果需要精确知道命中的文档数量，此时应把track_total_hits设置为true,但用户需要清楚的明白，如果命中的文档数量很大，会影响查询性能，而且会消耗大量的内存，甚至存在内存异常的风险。
如果你确切知道不需要知道命中的结果数，则把track_total_hits设为false,这会提升查询性能。