【Lucene】搜索的核心类简介

注:Lucene版本为3.4

 

IndexReader

IndexSearcher

Term

QueryParser

Query

TermQuery

TopDocs

ScoreDoc

 

搜索的基本类:Directory IndexReader IndexSearcher


图1 搜索使用到的各个类的相互关系

 

QueryParser

QueryParser负责将用户输入的查询表达式(见 Apache Lucene - Query Parser Syntax)转换成对应的Query实例。

过程中需要一个分析器将表达式分割成多个项(分析器 is used to find terms in the query text)。

注意: QueryParser 是搜索过程中用到分析器的唯一类

 

简单的使用:

 

import org.apache.lucene.queryParser.QueryParser;

//content为defaultField默认搜索域
QueryParser parser = new QueryParser(Version matchVersion, "content", Analyzer a);

//默认搜索域为上面构造中指定的defaultField,在表达式中也可以指定field,如//field:expression
Query query = parser.parse(String queryExpression);

 

 

Query

 

TermQuery

 

 

IndexReader

IndexReader is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable.

 

1. 打开一个reader需要较大的系统开销,建议重复使用同一个IndexReader实例(将某目录的IndexReader实例缓存),

只有在必要的时候才打开新的(见第2条)。

2. 何时打开一个新的reader:在创建IndexReader时,会搜索已有的索引快照,如果在这(生成了reader实例)之后又有线程往索引中增加或者删除了,若要看到这些更改,必须打开一个新的reader。

 

boolean isCurrent() 
          Check whether any new changes have occurred to the index since this reader was opened.

 

IndexReader reader = ..
IndexSearcher searcher = ..
IndexReader newReader = reader.reopen();
if(newReader != null){
    reader.close();
    reader = newReader;
    searcher = new IndexSearcher(reader);
}

 注意:实际应用中,多线程可能还在使用这个旧的reader在搜索,必须保证上面这段代码是线程安全的。完善上面代码

 

3. 通过IndexWriter来获得IndexReader,参见 near real time search

 

//IndexReader类
public static IndexReader open(IndexWriter writer,
                               boolean applyAllDeletes)
                        throws CorruptIndexException,
                               IOException

 applyAllDeletes - If true, all buffered deletes will be applied (made visible) in the returned reader. If false, the deletes are not applied but remain buffered (in IndexWriter) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false.

 

4.通过Directory来获得IndexReader

 

//IndexReader类
public static IndexReader open(Directory directory,
                                                   boolean readOnly)
                        throws CorruptIndexException,
                               IOException

 

获得IndexReader实例时设置ReadOnly:should pass readOnly=true, since it gives much better concurrent performance, unless you intend to do write operations (delete documents or change norms) with the reader.

实践中,我们使用IndexWriter里的deleteDocuments来删除索引中documents,

 

 

 

代码示例:IndexFileManagerImpl

 

下面代码中从如下顺序获取IndexReader实例,首先从reader的cache里获取,如果没有则尝试通过该索引目录的IndexWriter获取,再没有则直接Directory获得。

 

public IndexReader getIndexReader(String name) throws IOException {
        Lock lock = rwLock.readLock();
        lock.lock();
        try {
            IndexReader reader = readerCache.get(name);
            if (reader == null) {
                lock.unlock();
                lock = rwLock.writeLock();
                lock.lock();
                IndexWriter writer = writerCache.get(name);
                if (writer != null) {
                    reader = IndexReader.open(writer, false); // get parallel reader, deletes would not apply at this moment
                } else {
                    File dir = new File(rootDir, name);
                    if (dir.exists()) {
                        if(isWinOS)
                            reader = IndexReader.open(FSDirectory.open(dir), true); // read only reader
                        else
                            reader = IndexReader.open(NIOFSDirectory.open(dir), true); // read only reader
                    } else {
                        reader = null; // if the specific index files are not existing yet
                    }
                }
                if (reader != null) {
                    readerCache.put(name, reader);
                }
            }
            return reader;
        } finally {
            lock.unlock();
        }
    }
 

以上为IndexReader的基本用法,对其他更深入的知识点后续研究


IndexSearcher

根据查询条件(Query对象)进行搜索的模块。

 

 API 写道
For performance reasons, if your index is unchanging, you should share a single IndexSearcher instance across multiple searches instead of creating a new one per-search.
当能确保索引是不可变的话,出于性能考虑,建议将IndexSearcher缓存起来

             写道

If your index has changed and you wish to see the changes reflected in searching, you should use IndexReader.reopen() to obtain a new reader and then create a new IndexSearcher from that.

问题1:程序是怎么得知索引是否变化的? 

 

 

 

获得 IndexSearcher 实例

1.Directory --> IndexReader --> IndexSearcher

2.Directory -->  IndexSearcher

 

 

Term

 

 

TopDocs

搜索结果排序:相关性评分(默认):每个结果文档与查询条件的匹配程度进行排序。

其他评分策略

ScoreDoc

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值