Lucene: the core searching classes

最新推荐文章于 2023-03-18 18:03:26 发布

原创最新推荐文章于 2023-03-18 18:03:26 发布 · 116 阅读

0 ·

CC 4.0 BY-SA版权

Lucene 专栏收录该内容

9 篇文章

订阅专栏

本文详细介绍了Lucene的IndexSearcher类及其在全文检索中的应用，包括如何使用Directory打开索引、创建Term和TermQuery进行搜索，以及TopDocs类用于存储搜索结果的基本信息。此外，还提到了Lucene提供的不同类型的Query以及搜索结果的TopDocs类。

IndexSearcher

IndexSearcher is to searching what IndexWriter is to indexing: the central link to the index that exposes several search methods. You can think of IndexSearcher as a class that opens an index in a read-only mode. It requires a Directory instance, holding the previously created index, and then offers a number of search methods, some of which are implemented in its abstract parent class Searcher; the simplest takes a Query object and an int topN count as parameters and returns a TopDocs object.

Directory dir = FSDirectory.open(new File("/tmp/index"));
IndexSearcher searcher = new IndexSearcher(dir);
Query q = new TermQuery(new Term("contents", "lucene"));
TopDocs hits = searcher.search(q, 10);
searcher.close();

Term

A Term is the basic unit for searching. Similar to the Field object, it consists of a pair of string elements: the name of the field and the word (text value) of that field. Note that Term objects are also involved in the indexing process. However, they’re created by Lucene’s internals, so you typically don’t need to think about them while indexing. During searching, you may construct Term objects and use them together with
TermQuery:

Query q = new TermQuery(new Term("contents", "lucene"));
TopDocs hits = searcher.search(q, 10);

Query

Lucene comes with a number of concrete Query subclasses. So far in this chapter we’ve mentioned only the most basic Lucene Query: TermQuery. Other Query types are BooleanQuery, PhraseQuery, PrefixQuery, PhrasePrefixQuery, TermRangeQuery, NumericRangeQuery, FilteredQuery, and SpanQuery.

TopDocs

The TopDocs class is a simple container of pointers to the top N ranked search results—documents that match a given query. For each of the top N results, TopDocs records the int docID (which you can use to retrieve the document) as well as the float score.