概念
本篇文章没有讲述关于elasticsearch的详细内容以及一些概念的东西,注重写了一些新手刚接触elasticsearch时比较想知道的要点,详细内容可查阅elasticsearch权威指南
localhost:9200/索引index/文档类型type/标识符id
索引:相当于数据库
文档:相当于一行记录,由多个字段组成
文档类型(type):一个索引对象可以存储多个不同对象,文档类型用于区别索引下不同的对象;亦即文档是不同数据结构的,代表不同的文档类型,相当数据表格
标识符(id)
字段(field):文档的一部分,包括名称和值两部分
词(term):一个搜索单元,表示文本中的一个词
标志(token):表示在字段文本中出现的词,由这个词的文本、开始和结束偏移量以及类型组成
REST API操作数据
- 获取基本信息
curl GET http://localhost:9200/
- 获取集群中所有信息
curl GET http://localhost:9200/_cluster/state/nodes/
- 集群健康度:
curl -XGET http://localhost:9200/_cluster/health?pretty
- 关掉集群,向所有节点发送shutdown请求
curl -XPOST http://localhost:9200/_cluster/nodes/_shutdown
- 关闭单一节点
curl -XPOST http://localhost:9200/_cluster/nodes/标志符/_shutdown
- #### 增删改查
- 新建文档
curl -XPUT http://localhost:9200/blog/article/1 -d '{"title":"let me use ElasticSearch","content":"make a storge way","tags":["skill","storge","elasticSearch"]}'
- 检索文档
curl -XGET http://localhost:9200/blog/article/1
- 更新文档
curl -XPOST http://localhost:9200/blog/article/1/_update -d '{"script":"ctx._source.content=\"new version\""}'
- 更新插入
curl -XPOST http://localhost:9200/blog/article/1/_update -d '{"script":"ctx._source.counter+=1","upsert":{"counter":0}}'
- 删除文档
curl -XDELETE http://localhost:9200/blog/article/1
使用URI请求查询
- 查找索引映射
curl GET http://localhost:9200/blog/_mapping?pretty
- es查询会发送到_search端点
curl GET http://localhost:9200/blog/_search?pretty
curl GET http://localhost:9200/blog,index2/_search?pretty
curl GET http://localhost:9200/blog/article/_search?pretty
curl GET http://localhost:9200/_search?pretty
- 查询响应
curl -XGET 'localhost:9200/blog/_search?pretty&q=title:elastic'
- 查询分析
curl -XGET 'localhost:9200/blog/_analyze?field=title' -d 'elasticsearch server'
- 查询参数
curl -XGET 'localhost:9200/blog/_search?pretty&q=title:elasticsearch&df=title&explain=true&default_operator=AND&fields=title&sort=title:asc&size=2&from=2'
参数 | 说明 |
---|---|
q | 查询条件,相当于where |
df | 没有指定q时则使用该参数指定的字段 |
analyzer | 指定分析器 |
default_operator | 布尔运算符,默认OR |
explain | 查询解释,在返回结果中加入解释信息 |
fields | 返回字段 |
sort | 结果排序 |
timeout | 搜索超时 |
size,from | 查询结果窗口 |
索引
elasticsearch是无模式的搜索引擎,可以通过PUT数据计算出数据结构,也可以通过自己控制定义数据结构。
1. 修改索引自动创建
修改配置文件elasticsearch.yml关闭自动索引,action.auto_create_index:false
2. 可以修改默认分片、副本数量
curl -XPUT localhost:9200/blog/ -d '{"settings":{"number_of_shards":1,"number_of_replicas":2}}'
3. 索引结构映射
curl -XPUT localhost:9200/blog/ -d @blog.json
//blog.json
{
"mappings":{
"article":{
"dynamic":"false",//关闭自动添加字段
"_index":{
"enabled":true
},
"_id":{
"index":"not_analyzed",//不经分析编入索引
"store":"no"//不希望存储
//"path":"article_id"
},
"_routing":{
"required":true,
"path":"userID"
},
"properties":{
"id":
{"type":"long",
"store":"yes",
"precision_step":0,
"postings_format":"pulsing"//索引方式,加快查询速度
},
"userID":{"type":"long","store":"yes"},
"name":{"type":"string","store":"yes","index":"analyzed"},
"published":{"type":"date","store":"yes","precision_step":0,"format":"YYYY-mm-dd"},
"contents":{"type":"string","store":"no","index":"analyzed"},
"allowed":{"type":"boolean","store":"yes"},
"image":{"type":"binary"},
"address":{"type":"ip","store":"yes"},
"votes":{
"type":"integer",
"doc_values_format":"memory"//配置文档值,高效排序和切面搜索
},
"fields":{
"type":"string",
"field":{"type":"string","sotre":"yes"}
}
}
},
"user":{
"properties":{
"id":{"type":"long","store":"yes"},
"name":{"type":"string","store":"yes","index":"analyzed"}
}
}
}
}
//使用路由参数的查询
curl -XGET localhost:9200/blog/_search?routing=12,13&q=title:elasticsearch
- 批量索引
curl -XPOST localhost:9200/_bulk?pretty --data-binary @article.json
//article.json
{"index":{"_index":"blog","_type":"article","_id":1}}
{"title":"elasticsearch","content":"it is amazing"}
{"create":{"_index":"blog","_type":"article","_id":2}}
{"title":"elasticsearch","content":"it is interesting"}
{"create":{"_index":"blog","_type":"article","_id":2}}
{"title":"elasticsearch","content":"it is funny"}
{"delete":{"_index":"blog","_type":"article","_id":2}}
//批量索引文件默认大小微100MB,elasticsearch.yml
http.max_content_length:200MB
查询
curl -XGET 'localhost:9200/blog/article/_search?pretty=true -d '{
"fields":["title","id","year"],//select id,title...
"min_score":0.01,//elasticsearch会给文档评分,可限制返回文档最低分
"from":1,
"size":10,
"script_fields":{//使用脚本编辑返回数据
"changeYear":{
"script":"_source.year-paramYear",
"params":{
"paramYear":100
}
}
},
"query":{
"ids":{//标识符查询
"values":["1","2","3"]
},
"prefix":{//前缀查询
"title":{
"value":"elastic"
}
},
"query_string" : {
"query":"title:elasticsearch"
}
},
"sort":{
"year":"desc"
}
}
//查看查询如何执行
curl -XGET 'localhost:9200/blog/article/_search_shards?pretty=true -d '{
"query":{"match_all":{}}
}'
//验证查询是否错误
curl -XGET 'localhost:9200/blog/article/_validate/query?pretty&explain -d '{
"query":{"match_all":{}}
}'
- 词条查询,未经分析
{
"query":{
"term":{
"title":"elasticsearch"
}
}
}
{
"query":{
"term":{
"title":{
"value":"elasticsearch",
"boost":10.0//加权值,改变词条重要程度
}
}
}
}
//多词条
{
"query":{
"terms":{
"title":["elasticsearch","stored"],
"minimun":2 //至少匹配两个词条
}
}
}
- 常用词查询,对高频词和低频词分开计算得分,实现更高性能
{
"query":{
"common":{
"title":{
"query":"elasticsearch and store",//and 属于高频词
"cutoff_frequency":0.001
}
}
}
}
- match
//match_all 得到索引所有文档
{
"query":{
"match_all":{}
}
}
//match 与term对比会对词条进行分析
{
"query":{
"match":{
"title":{
"query":"elasticsearch and store",//会将该字段分成三个词进行文档匹配
"operator":"and"//默认or
}
}
}
}
//match_phrase 从分析后的词条构建短语
{
"query":{
"match_phrase":{
"title":{
"query":"elasticsearch store",
"slop":1 // elasticsearch和store之间可以有一个为止词条如and
}
}
}
}
//match_phrase_prefix 与match_phrase查询一样,特点是可对最后一个词条进行前缀匹配
{
"query":{
"match_phrase_prefix":{
"title":{
"query":"elasticsearch st",
"slop":1,
"max_expansions":20
}
}
}
}
//multi_match 多字段
{
"query":{
"multi_match":{
"query":"elasticsearch and store",
"fields":["title","content"]
}
}
}
//query_string 支持所有apache lucene语法
{
"query":{
"query_string":{
"query":"title:elasticsearch -content:store"
}
}
}
- 模糊查询fuzzy
{
"query":{
"fuzzy":{//很占用CPU
"title":"elasearch"
}
}
}
{
"query":{
"fuzzy_like_this_field":{
"title":{
"like_text":"elasticsearch and "
}
}
}
}
{
"query":{
"fuzzy_like_this":{
"fields":["title","content"],
"like_text":"elasticsearch and ",
"min_similarity":0.7//相似性,默认0.5
}
}
}
- 通配符?*
{
"query":{
"wildcard":{
"title":"e?as*search"
}
}
}
- 正则表达式
{
"query":{
"regexp":{
"title":{
"value":"el.sear[abc]h"
}
}
}
}
- 范围
{
"query":{
"range":{
"year":{
"gte":2012,
"lte":2017
}
}
}
}
复合查询
- 布尔查询
{
"query":{
"bool":{
"must":{
"term":{"title":"elasticsearch"}
},
"should":{
"range":{
"year":{"from":2010,"to":2017} }
},
"must_not":{
"term":{"content":"struct"}
}
}
}
}
- 加权查询
{
"query":{
"boosting":{
"positive":{
"term":{"title":"elasticsearch"}
},
"negative":{//减去0.333分
"term":{"content":"struct"}
},
"negative_boost":0.333
}
}
}
- 固定文档得分,封装一个查询或者过滤
{
"query":{
"constant_score":{
"query":{
"term":{"title":"elasticsearch"}
},
"boost":3.0
}
}
}
- 索引查询
{
"query":{
"indices":{//匹配索引执行的查询
"indices":["blog"],
"query"{
"term":{"title":"elasticsearch"}
}
},
"no_match_query":{//匹配不到的索引执行的查询
"term":{"title":"store"}
}
}
}
查询结果过滤
上面介绍过的查询的得分计算使得搜索变得复杂,耗费CPU资源,而过滤器不影响得分,过滤应用在整个索引的内容上,过滤的结果独立于找到的文档,也独立于文档之间的关系,过滤器也容易被缓存。
//过滤发生在发现文档之后
{
"query":{
"term":{"title":"elasticsearch"}
},
"post_filter":{
"term":{"year":2017}
}
}
//过滤发生在发现文档之前,效率更快
{
"query":{
"filtered":{
"query":{
"term":{"title":"elasticsearch"}
},
"filter":{
"term":{"year":2017}
}
}
}
}
- 范围过滤器
{
"post_filter":{
"range":{
"year":{"gte":2010,"lte":2017}
}
}
}
- exists
{
"post_filter":{
"exists":{"field":"title"}//过滤字段没有值的文档
}
}
- missing 与exists相对应
{
"post_filter":{
"missing":{
"field":"year",
"null_value":0,//可以额外指定视为空值的值
"existence":true
}
}
}
- script
{
"post_filter":{
"script":{
"_cache":true,//缓存
"script":"now-doc['year'].value>100",
"paramas":{"now":2017}
}
}
}
- type
{
"post_filter":{
"type":{"value":"article"}
}
}
- limit
{
"post_filter":{
"limit":{"value":1}//限定单个分片返回文档数
}
}
- id
{
"post_filter":{
"ids":{
//"type":["article"],
"values":[1]
}
}
}
- and,not,or 组合过滤器,数组
{
"post_filter":{
"not":[
"and":[
"range":{
"year":{"gte":2010,"lte":2017}
},
{
"or":[
"term":{"title":"elasticsearch"},
"term":{"title":"store"}
]
}
]
]
}
}
高亮显示
{
"query":{
"term":{"title":"elasticsearch"}
},
"highlight":{//全局定义
"pre_tags":["<br>"],//默认<em>
"post_tags":["</br>"],//默认</em>
"fields":{
"title":{}
}
}
}
{
"query":{
"term":{"title":"elasticsearch"}
},
"highlight":{
"require_field_match":true,
"fields":{
"title":{//局部定义
"pre_tags":["<br>"],//默认<em>
"post_tags":["</br>"],//默认</em>
},
"content":{//局部定义,若require_field_match为false,content匹配字段也会高亮
"pre_tags":["<br>"],//默认<em>
"post_tags":["</br>"],//默认</em>
}
}
}
}
索引扩展
//非扁平
{
"article":{
"author":{
"name":{
"firstName":"lig",
"lastName":"bee"
}
},
"isbn":"1424123131",
"year":2017,
"tags":[
{"headline":"elasticsearch"},
{"headline":"store"}
],
"copies":1
}
}
//定义数据结构
{
"article":{
"properties":{
"author":{
"type":"object",//对象类型
"properties":{
"name":{
"type":"object",
"properties":{
"firstName":{"type":"string","store":"yes},
"lastName":{"type":"string","store":"yes},
}
}
}
},
"isbin":{"type":"string","store":"yes"},
"year":{"type":"integer","store":"yes"},
"tags":{
"properties":{
"headline":{"type":"string","store":"yes"}
}
},
"copies":{"type":"integer","store":"yes"}
}
}
}
- 嵌套对象
{
"name":"t-shirt",
"kinds":[
{"size":"M","color":"black"},
{"size":"XXL","color":"white"}
]
}
//结构
{
"cloth":{
"properties":{
"name":{"type":"string","store":"yes"},
"kinds":{
"type":"nested",
"properties":{
"size":{"type":"string","store":"yes"},
"color":{"type":"string","store":"yes"}
}
}
}
}
}
//search
curl -XGET localhost:9200/shop/cloth/_search?pretty=true' -d '
{
"query":{
"nested":{
"path":"kinds",//指定嵌套对象
"query":{
"bool":{
"must":[
{"term":{
"kinds.size":"M"
}},
{"term":{
"kinds.cloth":"white"
}}
]
}
}
}
}
}
'
聚合
度量聚合:接收一个文档集并生成至少一个统计值
- min,max,sum,avg,value-count
{
"aggs":{
"min_year":{
"min":{
"field":"year"
}
}
}
}
//script
{
"aggs":{
"min_year":{
"min":{
"field":"year",
"script":"_value-100"
}
}
}
}
- status 返回前面所有聚合(min,max,sum,avg,value-count)
{
"aggs":{
"all_agg":{
"status":{
"field":"year"
}
}
}
}
- extended_status 包含更多扩展信息
桶聚合:返回子集统计数量(group by + count)
- terms
{
"aggs":{
"all_term":{
"terms":{
"field":"year",
"order":"desc"
}
}
}
}
- range
{
"aggs":{
"all_years":{
"range":{
"field":"year",
"ranges":[
{"to":2000},
{"from":2001,"to":2011},
{"from":2012,"to":2017}
]
}
}
}
}
- date-range 专用在使用日期类型的字段
{
"aggs":{
"all_date":{
"date_range":{
"field":"published",
"format":"YYYY MMMM DD",
"ranges":[
{"to":"2000/01/01"},
{"from":"2001/01/02","to":"2011/12/31"},
{"from":"2012/01/01","to":"2017/01/01"}
]
}
}
}
}
{
"aggs":{
"all_date":{
"date_range":{
"field":"published",
"format":"YYYY MMMM DD",
"ranges":[
{"to":"2000/01/01"},
{"from":"now,"to":"now+1y"}
]
}
}
}
}
- histogram 周期范围
{
"aggs":{
"years":{
"histogram":{
"field":"year",
"interval":4
}
}
}
}
- data-histogram
{
"aggs":{
"publish":{
"data_histogram":{
"field":"published",
"format":"yyyy-MM-dd HH:mm",
"interval":"31d"
}
}
}
}
- ipv4
{
"aggs":{
"ip_access":{
"ip_range":{
"field":"ip",
"ranges":[
{"from":"192.168.0.1","to":"192.168.0.254"},
{"mask":"192.168.0.0/24"}
]
}
}
}
}
- 嵌套
{
"aggs":{
"nested-agg":{
"nested":{
"path":"kinds"
},
"aggs":{
"sizes":{
"terms":{ "field":"kinds.size" } }
}
}
}
}
{
"aggs":{
"all_years":{
"range":{
"field":"year",
"ranges":[
{"to":2000},
{"from":2001,"to":2011},
{"from":2012,"to":2017}
]
},
"aggs":{
"status_all":{
"status":{} }
}
}
}
}
- 桶排序和嵌套聚合
{
"aggs":{
"all_term":{
"terms":{
"field":"copies",
"order":"defindNum.avg"
},
"aggs":{
"defindNum":{
"status":{} }
}
}
}
}