【ES】Elasticsearch学习_elasticserch8.17.3-优快云博客

本文链接：https://blog.youkuaiyun.com/zt0612xd/article/details/146420457

文章目录

运行：
elasticsearch或elasticsearch -E xpack.security.enabled=false
后台运行：
nohup ./bin/elasticsearch > es.log 2>&1 &

ES简单的安装

参考：https://blog.youkuaiyun.com/smilehappiness/article/details/118466378
官网：https://www.elastic.co/guide/en/elasticsearch/reference/current/targz.html

下载：https://www.elastic.co/cn/downloads/elasticsearch
解压：tar -zxvf elasticsearch-8.17.3-linux-x86_64.tar.gz
设置ES_HOME并添加path路径ES_HOME/bin
运行：elasticsearch
有可能用了非es内部的jdk报错。
打开bin/elasticsearch，配置jdk路径（避免ES调linux事先配置的jdk）

export JAVA_HOME=/data1/ztshao/programs/elasticsearch-8.17.3/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tool.jar

初次启动会初始化一个密码，存下来。
初始化信息
bash设置密码export ELASTIC_PASSWORD="your_password"

查看运行情况的运行方式：elasticsearch -E xpack.security.enabled=false。注意要加后面的enabled=false。不然返回都是空的。
https://stackoverflow.com/questions/71492404/elasticsearch-showing-received-plaintext-http-traffic-on-an-https-channel-in-con

import requests
# 定义要访问的URL
url= "http://127.0.0.1:9200/"
try:
    response = requests.get(url)
    # 输出服务器返回的内容
    print("Response:")
    print(response.text)
except requests.exceptions.RequestException as e:
    # 如果请求失败，输出错误信息
    print("Error:", e)

输出内容：

Response:
<Response [200]>
{
  "name" : "crowley.nju.edu.cn",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "9fN7znHVToW9PmKqGh3ITg",
  "version" : {
    "number" : "8.17.3",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "a091390de485bd4b127884f7e565c0cad59b10d2",
    "build_date" : "2025-02-28T10:07:26.089129809Z",
    "build_snapshot" : false,
    "lucene_version" : "9.12.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

输出这个内容说明安装成功了。

远程看浏览器：我这里直接通过ssh转了9200和8000的接口

ssh -R 9200:localhost:9200 -N ztshao@114.212.85.127
ssh -R 8000:localhost:8000 -N ztshao@114.212.85.127

远程访问，修改配置：config/elasticsearch.yml。添加配置network.host: 0.0.0.0允许远程访问
报错参考：https://blog.youkuaiyun.com/Leon_Jinhai_Sun/article/details/126673674

配置密码在shell里：

export ELASTIC_PASSWORD="your_password"

Kibana简单的安装

教程：https://www.elastic.co/guide/en/kibana/current/install.html
下载：https://www.elastic.co/guide/en/kibana/current/targz.html

https://www.elastic.co/guide/en/kibana/current/targz.html

下载安装包：https://artifacts.elastic.co/downloads/kibana/kibana-8.17.3-linux-x86_64.tar.gz
解压：tar -xzf kibana-8.17.3-linux-x86_64.tar.gz
添加bin环境变量
运行kibana
远程，需要接口5601和8000
ssh -R 5601:localhost:5601 -N ztshao@114.212.85.127

分词器安装IK

ES默认的分词器对中文不是很有后，默认做法是把每个字都分开。
而IK分词器会对中文友好。
https://github.com/infinilabs/analysis-ik/releases

bin/elasticsearch-plugin install https://get.infini.cloud/elasticsearch/analysis-ik/8.4.1
如果报jdk的错，也在elasticsearch-plugin里设置下jdk版本。
虽然还是报错。。：安装IK报错updatejava.net.UnknownHostException

因为还是报错，所以我这里直接下载zip手动安装。
https://blog.youkuaiyun.com/xujingyiss/article/details/123902714

cd plugins
mkdir ik
wget https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-8.17.3.zip
unzip elasticsearch-analysis-ik-8.17.3.zip -d ik
rm elasticsearch-analysis-ik-8.17.3.zip  # 必须删掉

安装完记得重启ES

简单学习

参考1
参考2-python，注意版本不一样，参考是6.x，我用的是8.x，有些地方不一致

kibana进dev tools进行测试，kibana进management看es的信息

可以去ik的github看分词用法。常见的分词方式ik_smart ik_max_word
两者区别：max_word是最细粒度，适合词条查询，考虑了所有可能的组合；smart是最粗粒度，适合词组查询。注意smart不是max_word的子集，不是！

测试分词

测试分词：

GET _analyze
{
    "analyzer": "ik_smart",
    "text": "中华人民共和国解放军"
}

输出

{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "解放军",
      "start_offset": 7,
      "end_offset": 10,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

增删改查

官网教程
查看所有的索引情况：GET _cat/indices?v

POST test/_doc/1
{
    "name":"测试",
    "price": 280
}
DELETE test/_doc/1
POST test/_update/1
{
    "doc": {
        "name":"新测试"
    }
}
GET test/_doc/1
GET test/_search

增加： PUT {index}/_doc/{id}

PUT test1/_doc/1
{
    "name": "kitty",
    "age": 12
}

test1是索引名称，类似于sql中的table。_doc是文档类型，es7.x之后默认使用_doc为占位符，废弃了文档类型。1是文档ID，唯一标识索引当前数据。
输出是：

{
  "_index": "test1",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

_shards是分片信息。创建的时候result是created，如果再执行一次就是updated表示更新操作，并且version变成2。

注意PUT操作做修改操作时，如果未指定其他字段，则其他未指定字段会丢失。

访问：

GET test1/_doc/1

输出：

{
  "_index": "test1",
  "_id": "1",
  "_version": 2,
  "_seq_no": 1,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "name": "kitty",
    "age": 12
  }
}

更新：POST {index}/_update/{id}

因为PUT直接更新会丢失一些没有指定的字段，所以需要局部更新。

删除: DELETE /{index}

DELETE /test1

search操作

返回结果里的hits。_score是匹配得分。_source是原文本。

简单查询
查询：GET test/_search?q=price:280。?q=后面接 参数名:参数值

复杂查询

GET test/_search
{
    "query":{
        "match":{
            "price": 280
        }
    }
}

返回所有元素：

GET goods/_search
{
    "query":{
        "match_all": {}
    }
}

排序，分页，source设置

可选参数：
排序：sort 只能对数字和日期排序，不能对文本或者其他类型元素排序。
分页查询：from, size。
返回的数据：_source。不设置就返回全部，反之返回指定属性。

GET goods/_search
{
    "query":{
        "match":{
            "producer":"中国"
        }
    },
    "sort":{
        "price":{
            "order":"desc"
        }
    },
    "from":0,
    "size":10,
    "_source": ["name", "producer", "price"]
}

bool查询：must，should，must_not，filter

多条件筛选：
must必须都满足
should是满足任何一个
must_not是必须都不满足

GET goods/_search
{
    "query":{
        "bool": {
            "must": [
              {
                "match": {
                  "producer": "中国"
                }
              },
              {
                "match":{
                    "price": 25
                }
              }
            ]
        }
    }
}

filter过滤

GET goods/_search
{
    "query":{
        "bool": {
            "filter":{
                "range":{
                    "price":{
                        "gt":25
                    }
                }
            }
        }
    }
}

短语检索，高亮检索，聚合

多个值空格间隔
短语检索用在列表里。

GET goods/_search
{
    "query":{
        "match": {
          "tags": "甜的 红色的"
        }
    }
}

高亮检索：

GET goods/_search
{
    "query":{
        "match": {
          "name": "香蕉"
        }
    },
    "highlight": {
        "pre_tags": "<b style='color:red'>",
        "post_tags": "</b>",
        "fields": {
            "name": {}
        }
    }
}

聚合均值：

GET goods/_search
{
    "from":0,
    "size": 0,
    "aggs":{
        "avg_price":{
            "avg": {
              "field": "price"
            }
        }
    }
}

注意size是影响展示出来的数据数量，不影响aggs。
常见的聚合还有sum，mean，

mapping

Mapping 映射，描述了文档字段的属性以及每个字段的数据。field字段可以都删掉。

PUT goods2
{
    "mappings": {
        "properties":{
            "description": {
                "type": "text",
                "fields": {
                    "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                    }
                }
            },
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                    }
                }
            },
            "price": {
                "type": "long"
            },
            "producer": {
                "type": "text",
                "fields": {
                    "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                    }
                }
            }
        }
    }
}

分词操作

带分词器的mapping：

PUT goods3
{
    "mappings": {
        "properties":{
            "description": {
                "type": "text",
                "analyzer": "ik_smart"
            },
            "name": {
                "type": "text"
            },
            "price": {
                "type": "long"
            },
            "producer": {
                "type": "text"
            }
        }
    }
}

或者不用每个property都标注一个分词器，可以统一设置：

PUT goods4/_doc/_mapping
{
    "doc":{
        "_all":{
            "analyzer":"ik_smart"
        },
        "properties":{
            "title":{
                "type":"text"
            },
            "content":{
                "type":"text"
            }
        }
    }
}

普通的match查询：

GET book/_search
{
    "query": {
        "match": {
          "content": "目标检测"
        }
    }
}

短语查询（对中文效果很明显，只按照短语查询）：

GET book/_search
{
    "query": {
        "match_phrase": {
          "content": "目标检测"
        }
    }
}

match：模糊匹配，先对输入进行分词，对分词后的结果进行查询，文档只要包含match查询条件的一部分就会被返回
match_phrase：分词，要求分词得到的词都按序出现。
match_phrase_prefix：以match_phrase后的结果为前缀的内容

python操作

py-elastic search文档：https://elasticsearch-py.readthedocs.io/en/latest/
安装：pip install elasticsearch

加载数据示例：https://github.com/elastic/elasticsearch-py/tree/main/examples/bulk-ingest

streaming_bulk用于数据量大的时候，可以逐个加载到内存中逐个存入。
bulk用于数据少的时候，全部加载到内存里然后存到es里。

增删改查操作：https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search-application-get

search 操作：传入body
https://stackoverflow.com/questions/62878805/how-to-search-in-elasticsearch-python-using-get-method