乐优商城day10(Elasticsearch,Spring Data Elasticsearch)

最新推荐文章于 2025-01-01 16:56:00 发布

原创最新推荐文章于 2025-01-01 16:56:00 发布 · 354 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#Elasticsearch #spring data Elasticsearch

乐优专栏收录该内容

18 篇文章

订阅专栏

本文详细介绍Elasticsearch的安装、配置及使用方法，包括全文检索技术、REST风格API、索引与映射操作、数据增删改查、查询技巧、聚合分析、SpringDataElasticsearch集成等，帮助读者掌握Elasticsearch在实际项目中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

所有代码发布在 [https://github.com/hades0525/leyou]

Day10
2019年1月30日
17:28

Elastic安装：
• 全文检索技术
• Elastic有一条完整的产品线：Elasticsearch、Kibana、Logstash等，前面说的三个就是大家常说的ELK技术栈。

在虚拟机中安装elastic
a. 不能再root下使用，必须安装在另一个用户leyou下 su - leyou
• chown leyou:leyou elasticserch/ -R (把文件夹所有文件变为leyou用户的)
b. 修改配置 jvm.options,elasticsearch.yml
jvm.options -Xms512m -Xmx512m
elasticsearch.yml path.data: /home/leyou/elasticsearch/data # 数据目录位置,必须在leyou用户下
path.logs: /home/leyou/elasticsearch/logs # 日志目录位置,必须在leyou用户下
c. 运行出错（3个错误）：修改用户的设置
vim /etc/security/limits.conf * soft nofile 65536

hard nofile 131072
soft nproc 4096
hard nproc 4096
vim /etc/security/limits.d/20-nproc.conf * soft nproc 4096
vim /etc/systemd/system.conf DefaultLimitNOFILE=4096
DefaultLimitNPROC=4096
vim /etc/sysctl.conf vm.max_map_count=655360
最后重启终端。
d. 运行 192.168.163.128:9200

在物理机安装kibana（kibana.bat要一直开启）
a. config目录，修改kibana.yml文件
elasticsearch.url: “http://192.168.163.128:9200”

b. 运行kibana http://127.0.0.1:5601
3. 虚拟机安装ik分词器
a. 安装在/elasticsearch/plugin下(一定要是leyou用户)
b. 重启elasticsearch（elasticsearch/bin/elasticsearch）就行了
Elastic使用：

基本概念
• elastic提供了REST风格的API
• Elasticsearch也是基于Lucene的全文检索库，本质也是存储数据，很多概念与MySQL类似的。
•
索引的操作
• 增删查改(遵循REST风格)
PUT /heima
{
“settings”: {
“number_of_shards”: 3,
“number_of_replicas”: 2
}
}
映射的操作
PUT /索引库名/_mapping/类型名称
{
“properties”: {
“字段名”: {
“type”: “类型”,
“index”: true，
“store”: false，(默认false)
“analyzer”: “分词器”
}，
}
} PUT heima/_mapping/goods
{
“properties”: {
“title”: {
“type”: “text”,
“analyzer”: “ik_max_word”
},
“images”: {
“type”: “keyword”,
“index”: “false”
},
“price”: {
“type”: “float”
}
}
}
a. 字段详解
i. type
String类型，又分两种：
• text：可分词，不可参与聚合
• keyword：不可分词，数据会作为完整字段进行匹配，可以参与聚合
ii. index
index影响字段的索引情况。
• true：字段会被索引，则可以用来进行搜索。默认值就是true
• false：字段不会被索引，不能用来搜索
iii. store
在学习lucene和solr时，我们知道如果一个字段的store设置为false，那么在文档列表中就不会有这个字段的值，用户的搜索结果中不会显示出来。但是在Elasticsearch中，即便store设置为false(默认false)，也可以搜索到结果。
原因是Elasticsearch在创建文档索引时，会将文档中的原始数据备份，保存到一个叫做_source的属性中。而且我们可以通过过滤_source来选择哪些要显示，哪些不显示。
数据操作
a. 随机id
POST /索引库名/类型名
{
“key”:“value”
}
b. 自定义id
POST /索引库名/类型/id值
{
…
}
c. 智能判断
在学习Solr时我们发现，我们在新增数据时，只能使用提前配置好映射属性的字段，否则就会报错。Elasticsearch非常智能，你不需要给索引库设置任何mapping映射，它也可以根据你输入的数据来判断类型，动态添加数据映射。
d. 修改/删除
PUT 修改必须指定id
DELETE 删除必须指定id • id对应文档存在，则修改
• id对应文档不存在，则新增
基本查询
a. 查询所有 match_all
GET /heima/_search
{
“query”:{
“match_all”: {}
}
}
b. 匹配查询 match
GET /heima/_search
{
“query”:{
“match”:{
“title”:“小米电视”
}
}
}

“match”:{
“title”:{“query”:“小米电视”,“operator”:“and”}
}
match类型查询，会把查询条件进行分词，
然后进行查询,多个词条之间是or的关系

某些情况下，我们需要更精确查找，
我们希望这个关系变成and

c. 多字段查询 multi_match
GET /heima/_search
{
“query”:{
“multi_match”: {
“query”: “小米”,
“fields”: [ “title”, “subTitle” ]
}
}
}
d. 词条匹配 term
GET /heima/_search
{
“query”:{
“term”:{
“price”:2699.00
}
}
}
e. 多词条匹配 terms
GET /heima/_search
{
“query”:{
“terms”:{
“price”:[2699.00,2899.00,3899.00]
}
}
}
f. 结果过滤 _source
GET /heima/_search
{
“_source”: [“title”,“price”],
“query”: {
“term”: {
“price”: 2699
}
}
} GET /heima/_search
{
“_source”: {
“includes”:[“title”,“price”]
},
“query”: {
“term”: {
“price”: 2699
}
}
}
6. 高级查询
a. 布尔组合 bool
GET /heima/_search
{
“query”:{
“bool”:{
“must”: { “match”: { “title”: “大米” }},
“must_not”: { “match”: { “title”: “电视” }},
“should”: { “match”: { “title”: “手机” }}
}
}
} bool把各种其它查询通过must（与）、must_not（非）
、should（或）的方式进行组合
b. 范围查询 range
GET /heima/_search
{
“query”:{
“range”: {
“price”: {
“gte”: 1000.0,
“lt”: 2800.00
}
}
}
}

c. 模糊查询 fuzzy
GET /heima/_search
{
“query”: {
“fuzzy”: {
“title”: “appla”
}
}
}

“fuzzy”: {
“title”: {
“value”:“appla”,
“fuzziness”:1
}
}

fuzzy 查询是 term 查询的模糊等价。

它允许用户搜索词条与实际词条的
拼写出现偏差，但是偏差的编辑距离不得超过2

可以通过fuzziness来指定允许的编辑距离
d. 过滤 filter
所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式。
GET /heima/_search
{
“query”:{
“bool”:{
“must”:{ “match”: { “title”: “小米手机” }},
“filter”:{
“range”:{“price”:{“gt”:2000.00,“lt”:3800.00}}
}
}
}
} 如果一次查询只有过滤，没有查询条件，不希望进行评分，我们可以使用constant_score取代只有 filter 语句的 bool 查询。

GET /heima/_search
{
“query”:{
“constant_score”: {
“filter”: {
“range”:{“price”:{“gt”:2000.00,“lt”:3000.00}}
}
}
}

e. 排序 sort
GET /goods/_search
{
“query”:{
“bool”:{
“must”:{ “match”: { “title”: “小米手机” }},
“filter”:{
“range”:{“price”:{“gt”:200000,“lt”:300000}}
}
}
},
“sort”: [
{ “price”: { “order”: “desc” }},
{ “_score”: { “order”: “desc” }}
]
}
7. 聚合
• 聚合可以让我们极其方便的实现对数据的统计、分析。
• Elasticsearch中的聚合，包含多种类型，常用的，一个叫桶(bucket)，一个叫度量。
• 桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶。
Elasticsearch中提供的划分桶的方式有很多：
• Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
• Histogram Aggregation：根据数值阶梯(interval)分组，与日期类似
• Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
• Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
bucket aggregations 只负责对数据进行分组，并不进行计算，因此往往bucket中往往会嵌套另一种聚合：metrics aggregations即度量。
b. 分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量(metrics)。
比较常用的一些度量聚合方式：
• Avg Aggregation：求平均值
• Max Aggregation：求最大值
• Min Aggregation：求最小值
• Percentiles Aggregation：求百分比
• Stats Aggregation：同时返回avg、max、min、sum、count等
• Sum Aggregation：求和
• Top hits Aggregation：求前几
• Value Count Aggregation：求总数
c. 聚合为桶
GET /cars/_search
{
“size” : 0,
“aggs” : {
“popular_colors” : {
“terms” : {
“field” : “color”
}
}
}
} • size：查询条数，这里设置为0，因为我们不关心搜索到的数据，只关心聚合结果，提高效率
• aggs：声明这是一个聚合查询，是aggregations的缩写
• popular_colors：给这次聚合起一个名字，任意。
• terms：划分桶的方式，这里是根据词条划分
• field：划分桶的字段
d. 桶内度量
GET /cars/_search
{
“size” : 0,
“aggs” : {
“popular_colors” : {
“terms” : {
“field” : “color”
},
“aggs”:{
“avg_price”: {
“avg”: {
“field”: “price”
}
}
}
}
}
} • aggs：我们在上一个aggs(popular_colors)中添加新的aggs。可见度量也是一个聚合
• avg_price：聚合的名称
• avg：度量的类型，这里是求平均值
• field：度量运算的字段

Spring Data Elasticsearch

Spring Data 的使命是给各种数据访问提供统一的编程接口，不管是关系型数据库（如MySQL），还是非关系数据库（如Redis），或者类似Elasticsearch这样的索引数据库。从而简化开发人员的代码，提高开发效率。
Spring Data Elasticsearch的使用
• 引入依赖

org.springframework.boot
spring-boot-starter-parent
2.0.6.RELEASE

org.springframework.boot spring-boot-starter-data-elasticsearch

• 配置文件
spring:
data:
elasticsearch:
cluster-name:elasticsearch
cluster-nodes:192.168.163.128:9300
• 准备实体类
@AllArgsConstructor
@NoArgsConstructor
@Data
@Document(indexName=“heima2”,type=“item”,shards=1)
publicclassItem{

@Field(type=FieldType.Long)
@Id
Longid;

@Field(type=FieldType.Text,analyzer=“ik_smart”)
Stringtitle;//标题

@Field(type=FieldType.Keyword)
Stringcategory;//分类

@Field(type=FieldType.Keyword)
Stringbrand;//品牌

@Field(type=FieldType.Double)
Doubleprice;//价格

@Field(type=FieldType.Keyword,index=false)
Stringimages;//图片地址,不需要被索引
}
• 创建索引库，建立映射
@Autowired
ElasticsearchTemplate template;

public void testCreate(){
//创建索引库
template.createIndex(Item.class);
//映射关系
template.putMapping(Item.class);

}
• 新增
@Autowired
ItemRepository repository;
publicvoidindexList(){
List list=new ArrayList<>();
list.add(newItem(1L,“小米手机7”,“手机”,“小米”,3299.00,“http://image.leyou.com/13123.jpg”));
list.add(newItem(2L,“坚果手机R1”,“手机”,“锤子”,3699.00,“http://image.leyou.com/13123.jpg”));
//接收对象集合，实现批量新增
repository.saveAll(list);
}

• 查询
public void testFind(){
Iterable items=repository.findAll();
for(Itemitem:items){
System.out.println(item.toString());
}
}
• 自定义查询
• Spring Data 的另一个强大功能，是根据方法名称自动实现功能。
public interface ItemRepository extends ElasticsearchRepository<Item,Long>{
List findByPriceBetween(Doublebegin,Doubleend);
}
• 不用写实现类，就可以用了
public void testFindBy(){
List list=repository.findByPriceBetween(2000d,4000d);
for(Itemitem:list){
System.out.println(“item=”+item);
}
}