elasticSearch总结(win7)

最新推荐文章于 2022-04-11 14:02:55 发布

汪少~

最新推荐文章于 2022-04-11 14:02:55 发布

阅读量217

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/TimeShare1/article/details/105633647

本文介绍如何使用Logstash同步MySQL数据至Elasticsearch，并利用Spring Data简化ES操作，包括创建实体类、接口及测试方法，实现数据查询与更新。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Elaticsearch ，简称为 es ， es 是一个开源的高扩展的分布式全文检索引擎，它可以近乎实时的存储、检索数据；本

身扩展性很好，可以扩展到上百台服务器，处理 PB 级别的数据。 es 也使用 Java 开发并使用 Lucene 作为其核心来实

现所有索引和搜索的功能，但是它的目的是通过简单的 RESTful API 来隐藏 Lucene 的复杂性，从而让全文搜索变得

简单。

安装

es可视化界面的启动 cmd进入后输入 grunt server

数据库同步es数据:

进入logstash 后新建三个文件

文件一 mysql.conf 具体内容如下

input {
   jdbc {
       jdbc_driver_library => "D:\mymaven\myLocalRepository\mysql\mysql-connector-java\5.1.43\mysql-connector-java-5.1.43.jar"
       jdbc_driver_class => "com.mysql.jdbc.Driver"
       jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/pubs?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true&useSSL=false"
       jdbc_user => "root"
       jdbc_password => "1234"
       # 定时器多久执行一次SQL，默认是一分钟
       schedule => "* * * * *"
       #是否清除 last_run_metadata_path 的记录,如果为真那么每次都相当于从头开始查询所有的数据库记录
       clean_run => "false"
       #这两行代表开启增量导入
       #record_last_run => true
#use_column_value => true
       #tracking_column => "id" 使用表中那一列作为参照
       #tracking_column_type => "numeric" 该列的类型，之前的填的是 int，启动报错，提示使用numeric
       #或者以下方式指定sql文件 :sql_last_value为1970-01-01 00:00:00
       #statement => "SELECT * FROM student"
       # 执行的sql 文件路径+名称
       statement_filepath => "jdbc.sql"
       jdbc_paging_enabled => "true"
       jdbc_page_size => "100000"
       # 索引类型
       type => "jdbc"
   }
stdin{
}
}

#filter { json { source => "message" remove_field => ["message"] } }

filter {
   json {
       source => "message"
       remove_field => ["message"]
   }
   date {
       match => ["timestamp","dd/MM/yyyy:HH:mm:ss Z"]
   }
}

output {
   elasticsearch {
       hosts => "localhost:9200"
       index=> "first_test"
       document_id => "%{id}"
       #配置模板文件
       template_overwrite => true
template => "template.json"
   }
stdout{
       # JSON格式输出
codec => "json_lines"
}
}

文件二 jdbc.sql 当数据发生新增或者修改都会动态的改变es中的数据

select * from studentBoos s where s.createtime > :sql_last_value or s.updatetime > :sql_last_value

文件三: template.json 从MySQL数据库导数据到es中时需要添加模板为了添加一个分词器

{
"template": "*",
"version": 50001,
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false,
"analyzer": "ik_max_word"
}
}
}
],
"properties": {
"@timestamp": {
"type": "date",
"include_in_all": false
},
"@version": {
"type": "text",
"include_in_all": false
}
}
}
}
}

使用的是springData 提供的模板

第一步先创建一个实体类

//@Document 文档对象 （索引信息、文档类型 ）
@Document(indexName="index_hello2",type="article")
@Data
public class Article {

    /*
    * @Document(indexName="blob3",type="article")：
    indexName：索引的名称（必填项）
    type：索引的类型
    @Id：主键的唯一标识
    @Field(index=true,analyzer="ik_smart",store=true,searchAnalyzer="ik_smart",type = FieldType.text)
    index：是否设置分词
    analyzer：存储时使用的分词器
    searchAnalyze：搜索时使用的分词器
    store：是否存储
    type: 数据类型
    * */

    //@Id 文档主键 唯一标识
    @Id
    //@Field 每个文档的字段配置（类型、是否分词、是否存储、分词器 ）ik_max_word , ik_smart
    @Field(store=true, index = false,type = FieldType.Integer)
    private Integer id;
    @Field(index=true,analyzer="ik_smart",store=true,searchAnalyzer="ik_smart",type = FieldType.Text)
    private String title;
    @Field(index=true,analyzer="ik_smart",store=true,searchAnalyzer="ik_smart",type = FieldType.Text)
    private String content;

创建一个接口 : 用来实现各种查询的api 可自定义但是不能自定义findAll方法调用会报错

public interface ArticleRepository extends ElasticsearchRepository<Article,Integer>

List<Article> findByTitle(String title);

List<Article> findByTitleOrContent(String title,String content);  //可以选择多个值
//分页
List<Article> findByTitleOrContent(String title, String content, Pageable pageable);

ElasticsearchTemplate 用来创建索引(库) 和document文档(表)和增加数据等

测试方法

@Autowired
private ElasticsearchTemplate template;

@Autowired
private StudentBookRepository bookRepository;

//根据id查询es中的信息
@Test
public void test() {
    Integer i = 29;
    Optional<StudentBooks> byId = bookRepository.findById(i);
    StudentBooks studentBooks = byId.get();
    System.out.println(studentBooks);
}
//模糊查询
@Test
public void test1() {
    List<StudentBooks> byUsername = bookRepository.findByUsername(null);
    byUsername.stream().forEach(u-> System.out.println(u.toString()));
}
@Test
public void testq() {
    List<StudentBooks> byUsername = bookRepository.findByUsernameAndPasswd("今天","23");
    byUsername.stream().forEach(u-> System.out.println(u.toString()));
}

//查询全部并且分页
@Test
public void test2() {
    Pageable of = PageRequest.of(0, 3);
    Iterable<StudentBooks> all = bookRepository.findAll(of);
    all.forEach(u-> System.out.println(u));
}

自己使用的地方:

业务中使用了两张数据库很大的视图进行关联查询而且子查询有十几个查询的速度非常的慢

使用es 将查询的数据导入到es中这个时候没有带入子查询并不会很慢主要是子查询慢

后期如果增加了数据和修改了数据后自动同步到es中