elasticsearch (1) 基础入门-优快云博客

本文链接：https://blog.youkuaiyun.com/LodbkMi/article/details/104801604

本文深入讲解Elasticsearch的安装、映射、索引和搜索功能，包括文档的增删改查，批量操作，以及如何利用DSL进行精确和全文搜索。通过实例演示了如何管理和查询数据，适用于初学者和有经验的开发者。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

TODO

分析器，映射，分区

安装

elastic.co/downloads/elasticsearch

映射关系
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

curl ‘http://localhost:9200/?pretty’

索引文档

PUT /megacorp/employee/1
content-type json —
{
“first_name” : “John”,
“last_name” : “Smith”,
“age” : 25,
“about” : “I love to go rock climbing”,
“interests”: [ “sports”, “music” ]
}

检索文档

GET /megacorp/employee/1

轻量搜索

GET /megacorp/employee/_search

DSL 使用json构造请求

match ：包含
有rock或者climbing 或者rock climbing都会检索出来
“match” : {
“about” : “rock climbing”
}
match_phrase 同时包含
只有包含rock climbing才会出来
“match_phrase” : {
“about” : “rock climbing”
}

GET /megacorp/employee/_search
{
“query” : {
“match” : {
“last_name” : “Smith”
}
}
}

 同样搜索姓氏为 Smith 的员工，但这次我们只需要年龄大于 30 的
{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "last_name" : "smith" 
                }
            },
            "filter": {
                "range" : {
                    "age" : { "gt" : 30 } 
                }
            }
        }
    }
}

全文搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

分析

挖掘出员工中最受欢迎的兴趣爱好：GET /megacorp/employee/_search
{
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }
    }
  }
}

数据输入输出

什么是文档：映射MySQL为一条记录
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

文档元数据
_index
文档在哪存放
_type
文档表示的对象类别
_id
文档唯一标识

索引文档
PUT /{index}/{type}/{id}

取出文档一部风

GET /website/blog/123?_source=title,text
{
  "_index" :   "website",
  "_type" :    "blog",
  "_id" :      "123",
  "_version" : 1,
  "found" :   true,
  "_source" : {
      "title": "My first blog entry" ,
      "text":  "Just trying this out..."
  }
}

更新文档
PUT /website/blog/123
{
“title”: “My first blog entry”,
“text”: “I am starting to get the hang of this…”,
“date”: “2014/01/02”
}
创建文档
POST /website/blog/
{ … }
删除文档
DELETE /website/blog/123

冲突处理
假设有两个 web 程序并行运行，每一个都同时处理所有商品的销售
在这里插入图片描述

悲观并发控制
这种方法被关系型数据库广泛使用，它假定有变更冲突可能发生，因此阻塞访问资源以防止冲突。一个典型的例子是读取一行数据之前先将其锁住，确保只有放置锁的线程能够对这行数据进行修改。
乐观并发控制
Elasticsearch 中使用的这种方法假定冲突是不可能发生的，并且不会阻塞正在尝试的操作。然而，如果源数据在读写当中被修改，更新将会失败。应用程序接下来将决定该如何解决冲突。例如，可以重试更新、使用新的数据、或者将相关情况报告给用户。

文档的部分更新(在原有的doc增加字段)

POST /website/blog/1/_update

{
   "doc" : {
      "tags" : [ "testing" ],
      "views": 0
   }
}
# 更新当发现冲突时，重试 retry_on_conflict 5次
POST /website/pageviews/1/_update?retry_on_conflict=5

取多个文档

get   /_mget
{
   "docs" : [
      {
         "_index" : "website",
         "_type" :  "blog",
         "_id" :    2
      },
      {
         "_index" : "website", # 索引
         "_type" :  "pageviews", # 类型
         "_id" :    1,
         "_source": "views"
      }
   ]
}

GET /website/blog/_mget
{
   "docs" : [
      { "_id" : 2 },
      { "_type" : "pageviews", "_id" :   1 }
   ]
}

批量操作

create
如果文档不存在，那么就创建它。详情请见创建新文档。
index
创建一个新文档或者替换一个现有的文档。详情请见索引文档和更新整个文档。
update
部分更新一个文档。详情请见文档的部分更新。
delete
删除一个文档。详情请见删除文档。

POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title":    "My first blog post" }
{ "index":  { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

搜索

每个结果还有一个 _score ，它衡量了文档与查询的匹配程度。
max_score 值是与查询所匹配文档的 _score 的最大值。
took 值告诉我们执行整个搜索请求耗费了多少毫秒。
_shards 部分告诉我们在查询中参与分片的总数，以及这些分片成功了多少个失败了多少个。
timed_out 值告诉我们查询是否超时。

请求体查询

空查询，查询所有
get /_search
请求体 json
{}

你可以使用 from 和 size 参数来分页
GET /_search
{
  "from": 30,
  "size": 10
}

must
文档 必须 匹配这些条件才能被包含进来。
must_not
文档 必须不 匹配这些条件才能被包含进来。
should
如果满足这些语句中的任意语句，将增加 _score ，否则，无任何影响。它们主要用于修正每个文档的相关性得分。
filter
必须 匹配，但它以不评分、过滤模式来进行。这些语句对评分没有贡献，只是根据过滤标准来排除或包含文档。

{
    "bool": {
        "must":     { "match": { "tweet": "elasticsearch" }},  # 必要条件 and
        "must_not": { "match": { "name":  "mary" }}, # not 
        "should":   { "match": { "tweet": "full text" }}, # or
        "filter":   { "range": { "age" : { "gt" : 30 }} } # 过滤器 age 范围大于 30
    }
}

multi_match 查询可以在多个字段上执行相同的 match 查询：
{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

#### 创建一个索引
PUT /my_index
#### 删除索引
delete /my_index
删除多个
DELETE /index_one,index_two

number_of_shards
每个索引的主分片数，默认值是 5 。这个配置在索引创建后不能修改。
number_of_replicas
每个主分片的副本数，默认值是 1 。对于活动的索引库，这个配置可以随时修改。

索引别名
PUT /my_index_v1
PUT /my_index_v1/_alias/my_index