Lucene笔记---全文检索引擎工具包

最新推荐文章于 2024-08-01 07:32:52 发布

原创最新推荐文章于 2024-08-01 07:32:52 发布 · 691 阅读

0 ·

CC 4.0 BY-SA版权

本文介绍如何使用Lucene创建全文检索索引及进行检索操作。首先通过Java代码示例展示创建索引的过程，包括配置分词器、索引路径、文档字段等；接着演示了检索流程，包括索引读取、查询解析及结果输出。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Lucene是apache软件基金会 jakarta项目组的一个子项目，是一个开源的全文检索引擎工具包

Lucene的数学模型：倒排表（反向索引：从词元到文档的索引）

Lucene的文件结构：

索引：一个索引放在一个文件夹中
- 段：一个索引可以有很多段，段与段之间是独立的，添加新的文档可能产生新段，不同的段可以合并成新段
  - 文档：文档是创建索引的基本单位，不同文档保存在不同段中，一个段可以包含多个文档
    - 域：一个文档可以包含不同类型的信息，可以拆分开索引
      - 词元：词是索引的最小单位，是经过词法分析和语言处理后的数据

JAVA实现demo

创建索引：

public class Lucene01
 {

 

    public static void main(String[]
 args) throws Exception
 {

        Analyzer
 analyzer = new StandardAnalyzer();  //创建标准分词器

        IndexWriterConfig
 indexWriterConfig = new IndexWriterConfig(analyzer);  //根据分词器创建索引配置对象

        indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);  //设置索引配置对象打开索引的方式为：无则新建有则追加

        Directory
 directory = null;

        directory
 = FSDirectory.open(Paths.get("D://index/test"));
 //创建索引所在目录对象

        IndexWriter
 indexWriter = null;

        indexWriter
 = new IndexWriter(directory,
 indexWriterConfig);//
 根据目录，索引写配置对象创建索引写对象

        Document
 doc1 = new Document();  //创建文档对象

        doc1.add(new StringField("id", "abcdef",
 Store.YES)); //插入域

        doc1.add(new TextField("content", "巫巫巫巫巫，啦啦啦啦啦啦啦啦",
 Store.YES));

        doc1.add(new IntField("num", 1,
 Store.YES));

        indexWriter.addDocument(doc1);  //向索引写对象添加文档对象

        indexWriter.commit();  //提交，创建索引

        indexWriter.close(); //关闭索引写对象

        directory.close();  //关闭文件目录，释放资源

    }

 

}

代码步骤：

步骤	关键类
1.创建分词器	Analyzer
2.创建索引配置，并设置索引打开模式	IndexWriterConfig
3.创建索引路径对象	Director FSDirector Paths
4.创建索引对象	IndexWriter
5.创建文档对象，向文档添加域对象	Document StringField TextField IntField
6.提交索引创建请求，并释放资源

检索索引：

public static void main(String[]
 args) throws Exception
 {

        Directory
 directory = null; 

        directory
 = FSDirectory.open(Paths.get("D://index/test")); 
 //指定索引文件所在目录

        DirectoryReader
 directoryReader = DirectoryReader.open(directory); //创建索引读对象

        IndexSearcher
 searcher = new IndexSearcher(directoryReader);  //创建索引检索对象

        Analyzer
 analyzer = new StandardAnalyzer();  //选择相应的分词器并创建，这里为标准分词，必须与创建索引时选用的分词器一致

        QueryParser
 parser = new QueryParser("content",
 analyzer);   //创建索引查询对象

        Query
 query = parser.parse("巫");           //由索引查询对象获得查询条件对象

        TopDocs
 topDocs = searcher.search(query, 10);  //通过索引检索对象检索与查询条件相关的前10个文档

        if (topDocs
 != null)
 { 

            System.out.println("符合条件的文档总数：
 " +
 topDocs.totalHits); //匹配文档总数

            for (int i
 = 0;
 i < topDocs.scoreDocs.length; i++) { //topDocs.scoreDocs是数组对象

                Document
 doc = searcher.doc(topDocs.scoreDocs[i].doc); //topDocs.scoreDocs[i].doc是个Int值，表示当前文档Index

                System.out.println(doc.get("id")); //获得该文档的id域

                System.out.println(doc.get("content"));

            }

        }

    }