【原创】elasticsearch 一些整理总结

16年公司检索系统是用的 solr, 但使用过程中发现太糟心, 十分难用- - 17年初换到 es. 把当时整理的知识点记录一下.
机器配置:
1台 centos, 2台 ubuntu.
内存: 64G, CPU: 8核 硬盘: 8TB(SAS, es 数据), 250GB(SSD, 系统)
es 版本: 5.4.0

数据量: 目前为止一共51亿数据, 2副本 102亿 (10209——)
内存: jvm 分配31G, 整机实际使用内存35G
硬盘: 三台分别使用2.5T, 一共使用 7.4G (8133790——)

install elasticsearch

configure es user (root)

# configure es user.
cd ~
u=es
p=ES_PASSWORD
g=sudo
path=/solr_mongo
mem_size=20
adduser $u --ingroup $g --disabled-password --gecos "" && echo "$u:$p" |chpasswd

download and install es (root)

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.0.tar.gz
tar zxvf elasticsearch-5.4.0.tar.gz -C /usr/local/
mv /usr/local/elasticsearch-5.4.0/ /usr/local/elasticsearch
chown -R $u /usr/local/elasticsearch

configure system setting (root)

## configure limits and vm.map
echo "$u -  nofile  65536" >> /etc/security/limits.conf
echo "$u -  nproc  65536"   >> /etc/security/limits.conf
echo "$u - memlock unlimited" >> /etc/security/limits.conf
echo "es - memlock unlimited" >> /etc/security/limits.conf
echo "es - memlock unlimited" >> /etc/security/limits.conf
echo "vm.max_map_count=262144" >> /etc/sysctl.conf

# configure /etc/hosts
echo "192.168.3.101    es-001" >> /etc/hosts
echo "192.168.3.102    es-002" >> /etc/hosts
echo "192.168.3.103    es-003" >> /etc/hosts

## install java
sudo yum install -y java java-openjdk

configure jvm memory for elasticsearch(root)

cat /usr/local/elasticsearch/config/jvm.options |sed "s/^-Xms2g/-Xms$mem_sizeg/" | sed "s/^-Xmx2g/-Xmx$mem_sizeg/" > .jvm.options
mv .jvm.options  /usr/local/elasticsearch/config/jvm.options
chown -R $u /usr/local/elasticsearch

configure data/log path for es(root)

path=/data/es/
mkdir -p $path/$u/{data,logs}
chown -R $u $path/$u

install analysis-ik

cd ~
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.4.0/elasticsearch-analysis-ik-5.4.0.zip
unzip elasticsearch-analysis-ik-5.4.0.zip -d /usr/local/elasticsearch/plugins/ik

mv /usr/local/elasticsearch/plugins/ik/config/ /usr/local/elasticsearch/config/analysis-ik
chown -R $u /usr/local/elasticsearch

elasticsearch.yml demo

# ======================== Elasticsearch Configuration =========================

# https://www.elastic.co/guide/en/elasticsearch/reference/index.html

# ---------------------------------- Cluster -----------------------------------
# Use a descriptive name for your cluster:
cluster.name: es_weibo

# ------------------------------------ Node ------------------------------------
# Use a descriptive name for the node:
node.name: es-001 
node.master: true
node.data: true

# Add custom attributes to the node:
node.attr.rack: r1

# ----------------------------------- Paths ------------------------------------
# Path to directory where to store the data (separate multiple locations by comma):
path.data: /data/es/data
#path.data: /tmp/es0/data,/tmp/es1/data

# Path to log files:
path.logs: /data/es/logs

#path.plugins: /tmp/es/plugins

# ----------------------------------- Memory -----------------------------------
# Lock the memory on startup:
## 锁定内存, 不使用交换区
bootstrap.memory_lock: true

# ---------------------------------- Network -----------------------------------
# Set the bind address to a specific IP (IPv4 or IPv6):
network.host: 10.30.57.230

http.port: 9200
transport.tcp.port: 9300

# --------------------------------- Discovery ----------------------------------
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["es-001","es-002","es-003"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
## 解决数据一致性, 1 无法确保数据正确
discovery.zen.minimum_master_nodes: 2
## 没有主节点时, 锁定哪些功能. all/write
discovery.zen.no_master_block: write
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
## 最少多少个节点在线 集群可用
gateway.recover_after_nodes: 1
## 集群节点总数 执行数据恢复
gateway.expected_nodes: 2
## 等待多长时间 执行数据恢复
gateway.recover_after_time: 5m
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true

processors: 8


http.cors.enabled: true
http.cors.allow-origin: "*"
http.max_content_length: 200mb
-------------------------------------------------------------------------------------

start es (es)

# start es
cd /usr/local/elasticsearch
screen -dm es
screen -ls
screen -S es -X stuff './bin/elasticsearch\n'

启动 es 之前, 最好重启一下系统sudo reboot

一些问题

# Q
"type": "cluster_block_exception",
"reason": "blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"
# A
gateway.expected_nodes: 1
gateway.recover_after_nodes: 1
# Q
413 Request Entity Too Large
# A
http.max_content_length: 500mb

https://github.com/elastic/elasticsearch/issues/2902
https://github.com/elastic/elasticsearch/blob/5.4/docs/reference/modules/http.asciidoc
https://github.com/elastic/elasticsearch/blob/5.4/core/src/main/java/org/elasticsearch/http/HttpTransportSettings.java
es 默认100MB
# 实现SQL: select books.* from books where books.id in (select users.books_id from user where users.id=1);

curl es:9200/books/_search -d
{
    "query": {
        "filtered": {
            "query" : {
                "match_all" :{}
            }
        }
        "filter":{
            "terms" : {
                "id" : {
                    "index" : "users",
                    "index" : "user",
                    "id" : 1,
                    "path" : "books_id"
                },
                "_cache_key" : "terms_lookup_user_1_books"
            }
        }
    }
}
# Q 
查询某个字段不为空的数据
# A
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "email"
          }
        }
      ],
      "must_not": {
        "term": {
          "email": ""
        }
      }
    }
  },
  "from": 0,
  "size": 10
}

__all – 6.0.0 会弃用. 可以用来做全局索引, 类似搜索引擎

status -- text

{
"index": "not_analyzed",  # 不作分析, 不会解析日期
"type":"date"
}

"ignore_above" : 20, ## 超过20个字符的那个字段 不作索引/存储
"settings": {
    "mapping.single_type": false  # 允许添加新字段
}

"_cache_key": 对查询结果做缓存, 可以用30分钟做一个 key
注意: 没看到怎么清理历史 "_cache_key"
需要存储: "_source"  -- mapping {"test_filed": {"_source": true}}
应该是指 store

"ignore_malformed" 忽略异常, 该字段不加入索引
"coerce"  强制转换类型, 例如把 "1" 转换为 1, 默认为 true, 如果为 false, 会被拒绝写入

"null_value" 不会更改原数据,返回值还是 null, 只可以在查询的时候使用 null_value 查询.
时间 null_value 可以设置为 "", 可以使用 
"must_not":[{"exists":{"field":"date"}}] 
查出 时间为空的, 没设置时间的, 时间为 null"enabled": false 只存储 不索引
"index": false 不索引 不可查询
"store": false

{"index":{"_index":"weibo", "_type":"test"}}
{"name":"111", "date":"2017-01-01"}
{"index":{"_index":"weibo", "_type":"test"}}
{"name":"222", "date":"149664828100"}
{"index":{"_index":"weibo", "_type":"test"}}
{"name":"333", "date":""}
{"index":{"_index":"weibo", "_type":"test"}}
{"name":"444"}
{"index":{"_index":"weibo", "_type":"test"}}
{"name":"555", "date":null}

https://stackoverflow.com/questions/22796103/null-value-mapping-in-elasticsearch
es 默认 mapping 为:

{
    "email": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
            }
        }
    }
}

注: 这里的 keyword 可以替换为任何名称, 可以认为是别名

查询语句需要调整为: 
{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "email"
          }
        }
      ],
      "must_not": {
        "term": {
          "email.keyword": ""
        }
      }
    }
  },
  "from": 0,
  "size": 10
}
中文分词查询
prefix 是分词结果的开始
term 是分词结果 模糊查询
prefix = like WORD%
term  = like %WORD%

查询多个词组
{
  "query": {
    "terms": {
      "content": [
        "洪水",
        "你好"
      ]
    }
  }
}

Error:
curl -XPOST http://127.0.0.1:9200/testindex/fulltext/1 -H 'Content-Type:application/json' -d'{"content":"美国留给伊拉克的是个烂摊子吗"}'
 index_closed_exception, status:403

fixed:
curl -i -XPOST 'http://localhost:9200/testindex/_open/?pretty'

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值