elasticsearch初体验

最新推荐文章于 2024-06-18 09:29:28 发布

抓不到老鼠的汤姆

最新推荐文章于 2024-06-18 09:29:28 发布

阅读量161

点赞数

本文链接：https://blog.youkuaiyun.com/u013496080/article/details/80388715

版权

1. 安装配置

安装比较简单，以下内容是针对Linux系统的设置，从官网下载压缩包解压，配置一下Java环境变量就能启动（需要安装Java jdk），一个很重要的点是elasticsearch的安全机制是很简单的，所有的操作都可以通过restful API进行，要删除一个索引只需要发送一个delete请求就行，而且还支持通配符，这就很可怕了，要在内网里面使用，不要让外网访问到，不然轻则数据被修改，严重的所有数据都会被删光。

配置文件在elasticsearch-5.6.9/config目录下的elasticsearch.yml一般有需要修改的配置有日志文件位置，数据文件位置，还有备份文件的位置，端口号等，可以参考这个重要配置的修改 | Elasticsearch: 权威指南 | Elastic

path.repo:/data/backups,/data2/backups #可以配置多个备份仓库目录

path.data: data/es_data,/data2/es_data,/data3/es_data #可以配置多个数据目录

http.cors.enabled: true

http.cors.allow-origin:"*"

network.bind_host:0.0.0.0

network.host: 0.0.0.0 #外网访问

http.port: 9200

elasticsearch是内存型应用，内存大对性能提升很有好处，但配置给elasticsearch jvm内存最好只给物理内存的一半就好，因为它的底层是Lucene，这个会利用操作系统的缓存机制来做缓存，然后虚拟机的最大和最小内存设置成一样的数值，避免虚拟机内存调整。官方建议不要使用高配置机器，内存32G的就够了，服务器配置太高性价比会很低，

把操作系统的ulimit设置成一个足够大的值，然后关闭系统的内存交换，因为elasticsearch会有大量的文件操作，需要很多文件描述符，系统默认的设置小的可怜。具体设置百度一下就行，要设置成永久性的，而不是只在当前回话窗口生效。数据目录如果有多个数据盘的话分布在不同的盘可以提高io性能。启动的话是到安装目录的bin目录下 ./elasticsearch –d 以后台方式运行（Linux环境下）

2. 初次使用

刚开始使用时需要将很多数据导入到elasticsearch，这个过程可以使用Java 也可以使用python进行，在导入的时候尽量选择配置好一点的服务器，主要是cpu，内存，磁盘（推荐ssd），最近我用python导了1.6T的数据到elasticsearch里面，单台服务器要导入这么多数据需要很长的时间，一个可选的方案是开一批云服务器每台导一些数据，然后备份出来到生产服务器上恢复，导入数据的速度相对备份恢复来说特别缓慢。

Elasticsearch有批量插入数据的接口_bulk，使用这个接口可以一次性插入几千条数据（主要看每条的数据大小，不要太大，5M可能是一个比较合适的大小），导入之前一定要先确定好导入的数据类型（mapping），数据类型错误在导入之后是没法更改的，只能通过重建索引修改，这是很耗时的过程。在开始之前可以把mappings 设置为允许动态添加 "dynamic":"strict"，然后这个简单的mapping新建索引，插入几条数据复制出mapping进行修改，然后用修改过的mapping开始导入数据。

在mapping中需要设置一些导入的参数，主要是分片数，段合并限流等，分片数设置为1，段合并限制 "indices.store.throttle.type": "none"，这样可以加快导入速度。

下面贴一段python插入elasticseach的代码

fromelasticsearch import Elasticsearch

fromelasticsearch.helpers import bulk

importlogging

defbulkInsert(records,param):

es = Elasticsearch(param['host'])

es_index = param['es_index']

es_type = param['es_type']

actions = []

for r in records:

action = {

"_index": es_index,

"_type": es_type,

"_id":r['asin'],

"_source": json.dumps(r)

}

actions.append(action)

success, _ = bulk(es, actions,index=es_index,request_timeout=100, raise_on_error=True)

print('insert result:'),success

es常用操作命令：

curl -XPUT 'http://localhost:9200/_snapshot/lostdata' -d '{"type": "fs","settings": { "location":"/data/backups/lostdata", "max_snapshot_bytes_per_sec" : "200mb", "max_restore_bytes_per_sec" : "200mb", "compress": true }}' 新建仓库

curl -XPUT 'http://localhost:9200/_snapshot/201801/full' -d '{ "indices": "i_selection_*_201801*" } ' #新建一个备份
curl -xput 'http://139.219.15.161:9200/_snapshot/backups/us01' '{"indices": "t_selection_in_20180428,t_selection_in_20180505"}' #备份索引(不加数据会备份所有的索引)
curl -XPUT 'http://localhost:9200/_snapshot/backups/othercountries' #备份所有索引
cur -XGET 'http://18.191.217.107:9200/_snapshot/201801/full/_status' #查看备份进度
curl XPUT 'http://42.159.81.24:9200/_snapshot/backups/othercountries/_restore' #恢复备份
cur -XGET 'http://18.191.217.107:9200/_recovery' #查看恢复进度
curl XPOST 'http://42.159.81.24:9200/_reindex' -d '{ "source": { "index": "t_selection_au_20180106" }, "dest": { "index": "au_test" } }' #重建索引
#日语切词（加入到mapping中的带有text的字段中
"product_name": { "type": "text", "analyzer": "kuromoji" }

GET /_cat/nodes?v#使用类Unix的cat功能查看加上?help可以获取使用帮助

http://xxx.xx.x.xx.x:9200/_cluster/stats #查看集群状态

GET my_index,another_index/_stats #索引状态查看（注意可以通过逗号分隔多个索引）

GET _cluster/pending_tasks #可以查看集群中的待执行

POST /_snapshot/my_backup/snapshot_1/_restore?wait_for_completion=true
{
"indices": "index_1",
"rename_pattern": "index_(.+)",
"rename_replacement": "restored_index_$1"

PUT /my_index/_settings
{
"index.search.slowlog.threshold.query.warn" : "10s",
"index.search.slowlog.threshold.fetch.debug": "500ms",
"index.indexing.slowlog.threshold.index.info": "5s" ,
"refresh_interval": "30s"
}

PUT /my_logs/_settings
{ "refresh_interval": -1 }

PUT /_cluster/settings
{
"transient" : {
"logger.index.search.slowlog" : "DEBUG",
"logger.index.indexing.slowlog" : "WARN"
}
}