一 . 基本概念
1.介绍
- Elasticsearch 是一个分布式可扩展的实时搜索和分析引擎,一个建立在全文搜索引擎 Apache Lucene(TM) 基础上的搜索引擎
2.主要作用
- 分布式实时文件存储,并将每一个字段都编入索引,使其可以被搜索。
- 实时分析的分布式搜索引擎。
- 可以扩展到上百台服务器,处理PB级别的结构化或非结构化数据。
3.理解模型图
- Elasticsearch是面向文档型数据库,一条数据在这里就是一个文档,用JSON作为文档序列化的格式
{ "name": "小明", "age" : "18" }
如name和小明单条数据,如果在普通关系型数据库中就是user表的行,但是在ElasticSearch中就是一个文档document,name就是一个文档中的field(字段),这个文档会属于一个类似于user的类型,这个类型会存在索引当中,如下图
Elastic ---> 索引(index) ---> 类型(type) ---> 文档(document) ---> 字段(field)
ElasticSearch中可以有多个索引,一个索引中也可以有多个类型,一个类型中也可以有多个document,一个document中可以有多个字段
4.ES中的名词
Node 与 Cluster
Elastic 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 Elastic 实例。
单个 Elastic 实例称为一个节点(node)。一组节点构成一个集群(cluster)。
- shards
代表索引分片,es可以把一个完整的索引分成多个分片,这样的好处是可以把一个大的索引拆分成多个,分布到不同的节点上。构成分布式搜索。分片的数量只能在索引创建前指定,并且索引创建后不能更改。
- replicas
代表索引副本,es可以设置多个索引的副本,副本的作用一是提高系统的容错性,当某个节点某个分片损坏或丢失时可以从副本中恢复。二是提高es的查询效率,es会自动对搜索请求进行负载均衡。
- recovery
代表数据恢复或叫数据重新分布,es在有节点加入或退出时会根据机器的负载对索引分片进行重新分配,挂掉的节点重新启动时也会进行数据恢复。
二.ElasticSearch配置
1.ES 安装配置(我这里使用的是elasticsearch-6.6.1版本,windows系统)
官网: https://www.elastic.co/products/elasticsearch
进入elasticsearch-6.6.1/bin目录下,点击elasticsearch.bat文件启动,访问localhost:9200,如下则表明安装成功
注:上面的cluster-name是自己配置定义的 spring.data.elasticsearch.cluster-name=my-application
2. elasticsearch + elasticsearch-head安装配置
官网:https://github.com/mobz/elasticsearch-head
作用: 查看es的运行状态以及数据
前奏: 因为我本地使用的elasticsearch版本是6+,所以本地要安装node.js和grunt,之前版本好像是plugin就可以了
这里node.js的安装就不做多介绍了,在我其他博客里或者网上大把 ,这里先安装grunt
执行 "npm install -g grunt-cli" 安装grunt,然后执行grunt -version查看是否安装成功,下面开始安装es-head
安装: (1)先修改ealisticsearch配置中的config/elasticsearch.yml文件,如下.
# ======================== Elasticsearch Configuration ========================= # # NOTE: Elasticsearch comes with reasonable defaults for most settings. # Before you set out to tweak and tune the configuration, make sure you # understand what are you trying to accomplish and the consequences. # # The primary way of configuring a node is via this file. This template lists # the most important settings you may want to configure for a production cluster. # # Please consult the documentation for further information on configuration options: # https://www.elastic.co/guide/en/elasticsearch/reference/index.html # # ---------------------------------- Cluster ----------------------------------- # # Use a descriptive name for your cluster: # cluster.name: my-application # # ------------------------------------ Node ------------------------------------ # # Use a descriptive name for the node: # node.name: node-1 # # Add custom attributes to the node: # #node.attr.rack: r1 # # ----------------------------------- Paths ------------------------------------ # # Path to directory where to store the data (separate multiple locations by comma): # #path.data: /path/to/data # # Path to log files: # #path.logs: /path/to/logs # # ----------------------------------- Memory ----------------------------------- # # Lock the memory on startup: # #bootstrap.memory_lock: true # # Make sure that the heap size is set to about half the memory available # on the system and that the owner of the process is allowed to use this # limit. # # Elasticsearch performs poorly when the system is swapping the memory. # # ---------------------------------- Network ----------------------------------- # # Set the bind address to a specific IP (IPv4 or IPv6): # network.host: 0.0.0.0 # # Set a custom port for HTTP: # http.port: 9200 # # For more information, consult the network module documentation. # # --------------------------------- Discovery ---------------------------------- # # Pass an initial list of hosts to perform discovery when new node is started: # The default list of hosts is ["127.0.0.1", "[::1]"] # #discovery.zen.ping.unicast.hosts: ["host1", "host2"] # # Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1): # #discovery.zen.minimum_master_nodes: # # For more information, consult the zen discovery module documentation. # # ---------------------------------- Gateway ----------------------------------- # # Block initial recovery after a full cluster restart until N nodes are started: # #gateway.recover_after_nodes: 3 # # For more information, consult the gateway module documentation. # # ---------------------------------- Various ----------------------------------- # # Require explicit names when deleting indices: # #action.destructive_requires_name: true http.cors.enabled: true http.cors.allow-origin: "*" node.master: true node.data: true
(2)把下载好的es-head文件解压,放到一个目录下,打开目录,找到Gruntfile.js修改如下
connect: { server: { options: { hostname: '*', port: 9100, base: '.', keepalive: true } } }
(3)然后这时候启动ealistics.bat文件,并进入elasticsearch-head-master目录下,cmd进入小黑窗口,执行npm install 命令安装,安装过后执行 npm run start 运行,运行成功如下
然后访问localhost:9100 就可以看到es详细信息了,如下
三. springBoot整合ElasticSearch
1.application.perperties简单配置
spring.data.elasticsearch.cluster-name=my-application spring.data.elasticsearch.cluster-nodes=127.0.0.1:9300 pring.data.elasticsearch.repositories.enabled=true
2.自定义接口继承ElasticsearchRepository接口
@Component public interface ProductRepository extends ElasticsearchRepository<Product, Long>{ //可以自定义方法 and List<Product> findByCategoryAndBrand(String category,String brand); }
3.新建实体类
package org.medical.domain; import java.io.Serializable; import org.springframework.data.annotation.Id; import org.springframework.data.elasticsearch.annotations.Document; import org.springframework.data.elasticsearch.annotations.Field; import org.springframework.data.elasticsearch.annotations.FieldType; import lombok.ToString; //document作用为类,标记实体类为文档对象 //indexName指对应索引库名称 //type指对应索引库中的类型 //shards 分片数量(默认5) //replicas 副本数量(默认1) @Document(indexName = "product",type="product",shards=1,replicas=1) @Data public class Product implements Serializable{ //标记一个字段为主键 @Id private Long id; //field标记为文档字段,并指定字段映射属性 //fieldType指定字段类型为text,自动分词并建立分词索引 //analyzer指定分词器,这里指ik分词器 @Field(type=FieldType.Text,analyzer="ik_max_word") private String title; //标题 //fieldType.keyWord存储数据的时候,不会分词建立索引 @Field(type=FieldType.Keyword) private String category;// 分类 @Field(type=FieldType.Keyword) private String brand; // 品牌 //fieldType指定字段类型为double @Field(type = FieldType.Double) private Double price; // 价格 //index:是否索引,布尔类型,默认是true @Field(index = false, type = FieldType.Keyword) private String images; // 图片地址 }
4.新建controller,用PostMan访问restful 接口测试即可
package org.medical.controller.ealisticSearch; import java.util.List; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.aggregations.AggregationBuilders; import org.elasticsearch.search.aggregations.bucket.terms.LongTerms; import org.elasticsearch.search.aggregations.metrics.avg.InternalAvg; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.medical.domain.Product; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.domain.Page; import org.springframework.data.domain.PageRequest; import org.springframework.data.elasticsearch.core.aggregation.AggregatedPage; import org.springframework.data.elasticsearch.core.query.FetchSourceFilter; import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RequestMethod; import org.springframework.web.bind.annotation.RestController; @RestController @RequestMapping("/api") public class EsController { @Autowired private ProductRepository productRepository; //ES测试新增数据 @RequestMapping(value="/save",method = RequestMethod.POST) public String save(){ Product pro = new Product(); pro.setId(123L); pro.setPrice(0.1); pro.setBrand("123"); pro.setCategory("123"); pro.setImages("123"); pro.setTitle("456"); productRepository.save(pro); return "success"; } //ES测试查询数据 @RequestMapping(value="/search",method = RequestMethod.POST) public void search() { Iterable<Product> all = productRepository.findAll(); for (Product product : all) { System.out.println("查询出的数据是:"+product); } } //ES测试自定义根据条件查询 @RequestMapping(value="/searchBy",method = RequestMethod.POST) public void searchBy() { Iterable<Product> all = productRepository.findByCategoryAndBrand("123","123"); for (Product product : all) { System.out.println("查询出的数据是:"+product); } } //matchQuery底层采用的是词条匹配查询 //termQuery 不紧可以匹配字符串查询,也可以匹配其他数据类型进行查询 @RequestMapping(value="/matchQuery",method = RequestMethod.POST) public void matchQuery() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 添加基本分词查询 queryBuilder.withQuery(QueryBuilders.matchQuery("category", "123")); //查询 Page<Product> products = productRepository.search(queryBuilder.build()); long total = products.getTotalElements(); System.out.println("总条数:"+total); for (Product product : products) { System.out.println("查出来产品brand数据是:"+product.getBrand()); } } //fuzzyQuery底层采用的是词条模糊匹配查询 //QueryBuilders.wildcardQuery(name,value)也是模糊查询,value加通配符 @RequestMapping(value="/fuzzyQuery",method = RequestMethod.POST) public void fuzzyQuery() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 添加基本分词查询 queryBuilder.withQuery(QueryBuilders.wildcardQuery("category", "1*")); //查询 Page<Product> products = productRepository.search(queryBuilder.build()); long total = products.getTotalElements(); System.out.println("总条数:"+total); for (Product product : products) { System.out.println("查出来产品brand数据是:"+product.getBrand()); } } //分页查询 @RequestMapping(value="searchByPage",method = RequestMethod.POST) public void searchByPage() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 添加基本分词查询 queryBuilder.withQuery(QueryBuilders.termQuery("category", "123")); //分页(从第一页开始,每页一条数据) int page = 0 ; int size = 1 ; queryBuilder.withPageable(PageRequest.of(page, size)); //查询数据 Page<Product> res = productRepository.search(queryBuilder.build()); //总的条数 long total = res.getTotalElements(); int totalPages = res.getTotalPages(); int number = res.getNumber()+1; int size2 = res.getSize(); System.out.println("总的条数:"+total+"总的页数:"+totalPages+"当前页:"+number+"每页大小:"+size2); //结果集 for (Product product : res) { System.out.println("返回结果:"+product.getBrand()); } } //排序查询 queryBuilder.withSort @RequestMapping(value="searchSort",method = RequestMethod.POST) public void searchSort() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 添加基本分词查询 queryBuilder.withQuery(QueryBuilders.termQuery("category", "123")); //排序,升序 queryBuilder.withSort(SortBuilders.fieldSort("id").order(SortOrder.ASC)); //查询数据 Page<Product> res = productRepository.search(queryBuilder.build()); //结果集 for (Product product : res) { System.out.println("返回结果:"+product.getId()); } } /* * (1)统计某个字段的数量 ValueCountBuilder vcb= AggregationBuilders.count("count_uid").field("uid"); (2)去重统计某个字段的数量(有少量误差) CardinalityBuilder cb= AggregationBuilders.cardinality("distinct_count_uid").field("uid"); (3)聚合过滤 FilterAggregationBuilder fab= AggregationBuilders.filter("uid_filter").filter(QueryBuilders.queryStringQuery("uid:001")); (4)按某个字段分组 TermsBuilder tb= AggregationBuilders.terms("group_name").field("name"); (5)求和 SumBuilder sumBuilder= AggregationBuilders.sum("sum_price").field("price"); (6)求平均 AvgBuilder ab= AggregationBuilders.avg("avg_price").field("price"); (7)求最大值 MaxBuilder mb= AggregationBuilders.max("max_price").field("price"); (8)求最小值 MinBuilder min= AggregationBuilders.min("min_price").field("price"); (9)按日期间隔分组 DateHistogramBuilder dhb= AggregationBuilders.dateHistogram("dh").field("date"); (10)获取聚合里面的结果 TopHitsBuilder thb= AggregationBuilders.topHits("top_result"); (11)嵌套的聚合 NestedBuilder nb= AggregationBuilders.nested("negsted_path").path("quests"); (12)反转嵌套 AggregationBuilders.reverseNested("res_negsted").path("kps "); * * */ //ES聚合为统 //桶(bucket) //桶的作用,是按照某种方式对数据进行分组,每一组数据在ES中称为一个桶 //bucket中往往会嵌套另一种聚合:metrics aggregations即度量(metrics) //聚合运算,例如求平均值、最大、最小、求和等,这些在ES中称为度量 //注意:聚合,排序不能被分词 @RequestMapping(value="searchBucket",method = RequestMethod.POST) public void searchBucket() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 不查询任何结果 queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null)); // 1、添加一个新的聚合,聚合类型为terms,聚合名称为brands,聚合字段为id queryBuilder.addAggregation(AggregationBuilders.terms("brands").field("id")); // 2、查询,需要把结果强转为AggregatedPage类型 AggregatedPage<Product> aggPage = (AggregatedPage<Product>)productRepository.search(queryBuilder.build()); // 3、解析 // 3.1、从结果中取出名为brands的那个聚合, // 因为是利用String类型字段来进行的term聚合,所以结果要强转为StringTerm类型 LongTerms agg = (LongTerms) aggPage.getAggregation("brands"); // 3.2、获取桶 List<LongTerms.Bucket> buckets = agg.getBuckets(); // 3.3、遍历 for (LongTerms.Bucket bucket : buckets) { // 3.4、获取桶中的key,即品牌名称 System.out.println(bucket.getKeyAsString()); // 3.5、获取桶中的文档数量 System.out.println(bucket.getDocCount()); } } //嵌套聚合,求平均值 @RequestMapping(value="searchBucketAvg",method = RequestMethod.POST) public void searchBucketAvg() { // 构建查询条件 NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); // 不查询任何结果 queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null)); // 1、添加一个新的聚合,聚合类型为terms,聚合名称为brands,聚合字段为id,在brand字段聚合求平均值 queryBuilder.addAggregation(AggregationBuilders.terms("brands").field("id") .subAggregation(AggregationBuilders.avg("priceAvg").field("price"))); // 2、查询,需要把结果强转为AggregatedPage类型 AggregatedPage<Product> aggPage = (AggregatedPage<Product>)productRepository.search(queryBuilder.build()); // 3、解析 // 3.1、从结果中取出名为brands的那个聚合, // 因为是利用String类型字段来进行的term聚合,所以结果要强转为StringTerm类型,也有longTerms,DoubleTerms LongTerms agg = (LongTerms) aggPage.getAggregation("brands"); // 3.2、获取桶 List<LongTerms.Bucket> buckets = agg.getBuckets(); // 3.3、遍历 for (LongTerms.Bucket bucket : buckets) { // 3.4、获取桶中的key,即品牌名称 System.out.println(bucket.getKeyAsString()); // 3.5、获取桶中的文档数量 System.out.println(bucket.getDocCount()); // 3.6.获取子聚合结果: InternalAvg avg = (InternalAvg) bucket.getAggregations().asMap().get("priceAvg"); System.out.println("平均售价:" + avg.getValue()); } } }