1.ElasticSearch基本概念
整个搜索客户端github地址:https://github.com/cweeyii/elasticsearch-parent
elasticsearch基本概念见:https://es.xiaoleilu.com/010_Intro/05_What_is_it.html
集群模式安装:http://blog.youkuaiyun.com/cweeyii/article/details/71055884
2. 重点概念
- 搜素类型(searchType)
特别是你需要检索出满足条件的文档数量时,可以直接设置为count类型,即只会返回命中的文档数量。(相当于mysql:select count(1) from table where valid=0)
PS:该类型现在已经被废弃可以直接设置search条件中的from=0 size=0即可,效率一样。
#检索条件构造:
SearchRequestBuilder builder = client.prepareSearch(indexName).setQuery(searchCondition.getQueryBuilder()).setFrom(0).setSize(0);
#结果数量获取:
SearchHits hits = searchResponse.getHits();
hits.getTotalHits();
- 默认对象
ES建立的索引中会包换多个元数据字段,每一个都以下划线开头,例如 _type, _id,_index 和 _source
这些字段是十分有用的,例如可以将用户记录中的主键设置为_id的内容,可以实现根据主键更新es记录的作用,并且可以实现根据id获取记录或者实现查询中过滤指定id的记录的功能。参见:IdsQueryBuilder。并且如果没有设置逻辑的routing,那么记录定位shard分片就是根据_id来实现的。
_index索引名字 _type索引类型 _id文档id _source (Elasticsearch 用来保存文档主体 JSON字段)
- 动态映射
当 Elasticsearch 处理一个位置的字段时,它通过【动态映射】来确定字段的数据类型且自动将该字段加到类型映射中。例如:你可以不用先自己去建立mapping关系,es会根据你传入的索引类型中的字段的类型来自动映射,如string类型会分词和存储。这个功能在你要对索引的对象加上一个字段的时候,非常有用。你不需要去删除和修改mapping,只要刷一遍数据,这个字段就自动刷到索引中了。
但是有时候该功能不是想要的。如下面一个mapping就不能通过自动映射来实现:
{
"settings": {
"index": {
"number_of_replicas": "1",
"number_of_shards": "5"
}
},
"mappings": {
"enterprise_basic_info": {
"_all": {
"enabled": false
},
"properties": {
"id": {
"type": "long" },
"enterpriseName": {
"type": "string",
"analyzer": "ik_max_word" },
"address": {
"type": "string",
"analyzer": "ik_max_word" },
"latitude": {
"type": "double" },
"longitude": {
"type": "double" },
"phone": {
"type": "string",
"index": "not_analyzed" },
"businessCategory": {
"type": "string",
"index": "not_analyzed" },
"cityName": {
"type": "string",
"index": "not_analyzed" },
"districtName": {
"type": "string",
"index": "not_analyzed" },
"valid": {
"type": "long" },
"location": {
"type": "geo_point",
"geohash_prefix":true,
"geohash_precision":12 }
}
}
}
}
其中城市和行政区虽然都是字符串类型但是并不需要其被分词。因此对于建立索引推荐还是自己配置mapping
- 索引别名
索引别名有点像指针的作用,其并不会存储数据或者产生一个新的索引,其主要是指定向一个索引。别名常用于索引的快速切换的功能。例如:刚开始你的索引别名my_index指向my_index1,你可以不用开关机,修改代码直接将my_index指向my_index2 - QueryBuilder和FilterBuilder的区别
FilterBuilder在检索的时候,实现的是过滤的功能,它会将所有的记录根据筛选条件进行预先的筛选,然后在筛选的结果里面进行QueryBuilder的查询。因此FilterBuilder也有选取满足指定条件的记录的功能,并且该筛选结果会被缓存起来,下一次有同样条件的筛选要求,就不需要重新计算了,另外与QueryBuilder比较FilterBuilder其不需要计算文档的相关性,因此速度更快。【官网解释】
PS:我做实验发现,并没有速度的提升,有可能进行了QueryBuilder的优化,或者我索引文档的数量太少(10000条记录)体现不出差别:
见后文的github中代码:ElasticSearchConditionTest
int times = 10000;
SearchCondition filterCondition = new SearchCondition();
filterCondition.setFilterBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST)
.term("valid", 1, OperationType.MUST).builder());
Long beginTime1 = System.currentTimeMillis();
for (int i = 0; i < times; i++) {
List<EnterpriseBasicInfoDTO> basicInfoDTOList = enterpriseSearchHandle.getListByCondition(filterCondition, null);
}
Long beginTime2 = System.currentTimeMillis();
LOGGER.info("运行Filter {} 花费时间{} 秒", times, (beginTime2 - beginTime1) / 1000);
SearchCondition queryCondition = new SearchCondition();
queryCondition.setQueryBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST).
term("valid", 1, OperationType.MUST).builder());
for (int i = 0; i < times; i++) {
List<EnterpriseBasicInfoDTO> basicInfoDTOList = enterpriseSearchHandle.getListByCondition(queryCondition, null);
}
Long beginTime3 = System.currentTimeMillis();
LOGGER.info("运行query {} 花费时间{} 秒", times, (beginTime3 - beginTime2) / 1000);
执行结果:可以发现进行一万次就只有几秒的提升,感觉并没有太大区别。
18:10:51.718 INFO (ElasticSearchConditionTest.java:41) - 运行Filter 10000 花费时间27 秒
18:11:26.416 INFO (ElasticSearchConditionTest.java:49) - 运行query 10000 花费时间34 秒
- 快速的距离范围查找GeoHash
GeoHash算法是主要用于解决快速查找邻域范围(如500m内商家)类的其他记录的功能的算法。其主要思想是将整个地球品面分为8分,每一份由不同的字符表示,同样的对于每一份也递归的进行切分,最后根据你设置的geohash的长度,没一份覆盖的范围越来越小,因此如果需要求范围内点,只需要获取领域返回的其他几块,之后在这些删选数据中在进行高消耗的详细计算。
上图是geohash不同长度对应的精度。如11位长的geohash编码能够到达查找15米范围的所有相邻点的功能。
要设置坐标的geohash功能需要添加一个新的字段来表示,如我有一堆POI有经纬度坐标,为了要实现geohash范围查找的功能,我需要在mapping中加入一个location字段
#mapping中设置
"latitude": {
"type": "double"
},
"longitude": {
"type": "double"
},
"location": {
"type": "geo_point",
"geohash_prefix":true,
"geohash_precision":12
}
#在建立索引的类型对象中只需要如下设置即可:(利用fastJson序列化只根据get和set方法来判断是否具有location字段,你可以不用设置该字段,具体代码也可以看下面的gitHub链接,里面有具体的实现)
public String getLocation() {
return latitude + "," + longitude;
}
- ElasticSearch具体操作
term匹配(不进行分词)准确匹配:TermQueryBuilder
queryString(进行分词)分词匹配:QueryStringQueryBuilder根据QueryStringQueryBuilder.Operator的操作是AND 还是OR操作来决定分词结果是需要同时包含,还是包含其中一个就行。
prefix 准确匹配: 如果索引的字段需要进行分词,那么根据该分词结果的term是否有prefix指定的前缀,如果有则匹配。如果索引的字段不进行分词,那么看该字段内容是否有prefix前缀。PrefixQueryBuilder
range(范围匹配,大小、小于、between and):指定字段是否在该范围内,如果在则匹配。RangeQueryBuilder
notInId或者idIn:根据id进行筛选或者过滤。IdsQueryBuilder
fuzzy模糊匹配:根据字符串之间的编辑距离来匹配FuzzyQueryBuilder
wildcard通配符匹配:根据通配符来匹配字符串WildcardQueryBuilder
geoDistance地理坐标范围匹配:根据各种计算距离的方式来实现距离范围匹配GeoDistanceQueryBuilder
geoHash根据geohash编码来进行近似范围匹配:GeohashCellQuery.Builder - 开发的elasticsearch通用包
github地址:https://github.com/cweeyii/elasticsearch-parent
client包:主要实现对Query操作的编辑包装和搜索操作的封装,特别好用
重要类介绍:
query和filter的条件构造类:
package com.cweeyii.operation;
import org.elasticsearch.common.unit.DistanceUnit;
import org.elasticsearch.index.query.*;
import org.springframework.util.CollectionUtils;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
/**
* Created by wenyi on 17/5/9.
* Email:caowenyi@meituan.com
*/
public class OperationBuilderFactory {
public static Builder builder() {
return new Builder();
}
public static class Builder {
private Map<OperationType, List<QueryBuilder>> queryBuilderMap = new ConcurrentHashMap<>();
private Builder(){}
public Builder term(String field, Object value, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new TermQueryBuilder(field, value));
return this;
}
public Builder queryString(String field, String value, OperationType operationType, QueryStringQueryBuilder.Operator operator) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new QueryStringQueryBuilder(value).field(field).defaultOperator(operator));
return this;
}
public Builder queryString(String field, String value, OperationType operationType) {
return queryString(field, value, operationType, QueryStringQueryBuilder.Operator.OR);
}
public Builder prefix(String field, String prefix, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new PrefixQueryBuilder(field, prefix));
return this;
}
public Builder range(String field, Object from, Object to, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new RangeQueryBuilder(field).from(from).to(to));
return this;
}
public Builder notInId(List<String> ids, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new IdsQueryBuilder().ids(ids));
return this;
}
public Builder fuzzy(String field, Object value, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new FuzzyQueryBuilder(field, value));
return this;
}
public Builder wildcard(String field, String value, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new WildcardQueryBuilder(field, value));
return this;
}
public Builder geoDistance(String field, double lat, double lon, double distance, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new GeoDistanceQueryBuilder(field).point(lat, lon).distance(distance, DistanceUnit.METERS));
return this;
}
public Builder geoHash(String field, double lat, double lon, int precisionLevel, OperationType operationType) {
List<QueryBuilder> queryBuilders = getQueryBuilders(operationType);
queryBuilders.add(new GeohashCellQuery.Builder(field).point(lat, lon).precision(precisionLevel).neighbors(true));
return this;
}
public QueryBuilder builder() {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
List<QueryBuilder> mustBuilders = getQueryBuilders(OperationType.MUST);
if (!CollectionUtils.isEmpty(mustBuilders)) {
for (QueryBuilder queryBuilder : mustBuilders) {
boolQueryBuilder.must(queryBuilder);
}
}
List<QueryBuilder> mustNotBuilders = getQueryBuilders(OperationType.MUST_NOT);
if (!CollectionUtils.isEmpty(mustNotBuilders)) {
for (QueryBuilder queryBuilder : mustNotBuilders) {
boolQueryBuilder.mustNot(queryBuilder);
}
}
List<QueryBuilder> shouldBuilders = getQueryBuilders(OperationType.SHOULD);
if (!CollectionUtils.isEmpty(shouldBuilders)) {
for (QueryBuilder queryBuilder : shouldBuilders) {
boolQueryBuilder.should(queryBuilder);
}
}
return boolQueryBuilder;
}
public List<QueryBuilder> getQueryBuilders(OperationType operationType) {
List<QueryBuilder> queryBuilders = queryBuilderMap.get(operationType);
if (queryBuilders == null) {
synchronized (this) {
if (queryBuilders == null) {
queryBuilders = new ArrayList<>();
queryBuilderMap.put(operationType, queryBuilders);
}
}
}
return queryBuilders;
}
}
}
搜素条件包装类:实现了搜索条件的封装、排序、聚合
public class SearchCondition {
private QueryBuilder queryBuilder = null;
private QueryBuilder filterBuilder = null;
private List<SortBuilder> orders = new ArrayList<>();
private List<AbstractAggregationBuilder> aggregationBuilders = new ArrayList<>();
private SearchType searchType;
private int limit = 20;
private int offset = 0;
private int total = 0;
public List<AbstractAggregationBuilder> getAggregationBuilders() {
return aggregationBuilders;
}
public void setAggregationBuilders(List<AbstractAggregationBuilder> aggregationBuilders) {
this.aggregationBuilders = aggregationBuilders;
}
public List<SortBuilder> getOrders() {
return orders;
}
public void setOrders(List<SortBuilder> orders) {
this.orders = orders;
}
public int getTotal() {
return total;
}
public SearchCondition setTotal(int total) {
this.total = total;
return this;
}
public SearchCondition orderBy(String field, double lat, double lon, SortOrder order, GeoDistance geoDistance) {
if (!StringUtils.isEmpty(field)) {
orders.add(new GeoDistanceSortBuilder(field).order(order).point(lat, lon).geoDistance(geoDistance));
}
return this;
}
public SearchCondition orderBy(String field, double lat, double lon) {
return orderBy(field, lat, lon, SortOrder.ASC, GeoDistance.DEFAULT);
}
public SearchCondition orderBy(String field, SortOrder order) {
if (!StringUtils.isEmpty(field)) {
orders.add(new FieldSortBuilder(field).order(order));
}
return this;
}
public SearchCondition orderBy(String field) {
return orderBy(field, SortOrder.ASC);
}
public QueryBuilder getQueryBuilder() {
if (queryBuilder == null) {
return QueryBuilders.matchAllQuery();
}
return queryBuilder;
}
public SearchCondition setQueryBuilder(QueryBuilder queryBuilder) {
this.queryBuilder = queryBuilder;
return this;
}
public QueryBuilder getFilterBuilder() {
return filterBuilder;
}
public SearchCondition setFilterBuilder(QueryBuilder filterBuilder) {
this.filterBuilder = filterBuilder;
return this;
}
public SearchCondition setAggregation(String field, double lat, double lon, Pair<Double, Double>... rangePoints) {
if (!StringUtils.isEmpty(field)) {
GeoDistanceBuilder geoDistanceBuilder = new GeoDistanceBuilder(field).point(new GeoPoint(lat, lon)).unit(DistanceUnit.METERS);
for (Pair<Double, Double> rangePoint : rangePoints) {
geoDistanceBuilder.addRange(rangePoint.getFirst(), rangePoint.getSecond());
}
aggregationBuilders.add(geoDistanceBuilder);
}
return this;
}
public SearchType getSearchType() {
return searchType;
}
public SearchCondition setSearchType(SearchType searchType) {
this.searchType = searchType;
return this;
}
public int getLimit() {
return limit;
}
public SearchCondition setLimit(int limit) {
this.limit = limit;
return this;
}
public int getOffset() {
return offset;
}
public SearchCondition setOffset(int offset) {
this.offset = offset;
return this;
}
}
使用方法:
SearchCondition searchCondition = new SearchCondition();
searchCondition.setFilterBuilder(OperationBuilderFactory.builder().queryString("address", "云中", OperationType.MUST)
.term("valid", 1, OperationType.MUST).builder());
searchCondition.setFilterBuilder(OperationBuilderFactory.builder().geoHash("location", lat, lon, 5, OperationType.MUST).builder())
.orderBy("location", lat, lon, SortOrder.ASC, GeoDistance.ARC).orderBy("id", SortOrder.ASC).setOffset(0).setLimit(100);