lucene当中的各种query（一）

最新推荐文章于 2021-02-24 21:19:49 发布

wwty1314

最新推荐文章于 2021-02-24 21:19:49 发布

阅读量350

点赞数

CC 4.0 BY-SA版权

分类专栏：抓取搜索文章标签： lucene Perl

本文链接：https://blog.youkuaiyun.com/wwty1314/article/details/83638612

抓取搜索专栏收录该内容

18 篇文章

订阅专栏

TermQuery

首先介绍最基本的查询，如果你想执行一个这样的查询：“在content域中包含‘lucene’的document”，那么你可以用TermQuery：

Term t = new Term("content", " lucene");

Query query = new TermQuery(t);

BooleanQuery

如果你想这么查询：“在content域中包含java或perl的document”，那么你可以建立两个TermQuery并把它们用BooleanQuery连接起来：
　　TermQuery termQuery1 = new TermQuery(new Term("content", "java");
　　TermQuery termQuery 2 = new TermQuery(new Term("content", "perl");
　　BooleanQuery booleanQuery = new BooleanQuery();
　　booleanQuery.add(termQuery 1, BooleanClause.Occur.SHOULD);
　　booleanQuery.add(termQuery 2, BooleanClause.Occur.SHOULD);

PhraseQuery

你可能对中日关系比较感兴趣，想查找‘中’和‘日’挨得比较近(5个字的距离内)的文章，超过这个距离的不予考虑，你可以：
　　PhraseQuery query = new PhraseQuery();
　　query.setSlop(5);
　　query.add(new Term("content ", “中”));
　　query.add(new Term(“content”, “日”));
　　那么它可能搜到“中日合作……”、“中方和日方……”，但是搜不到“中国某高层领导说日本欠扁”。

注意：此query刚开始测试的时候，没有成功，原因在于必须保证搜索的term在分词当中存在，如果搜索一个不存在的词，肯定是没戏的。

DisjunctionMaxQuery

If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields.

提供了针对某个短语的最大score。这一点对多字段的搜索非常有用

A query that generates the union of documents produced by its subqueries

This is useful when searching for a word in multiple fields with different boost factors

DisjunctionMaxQuery(Collection<Query> disjuncts, float tieBreakerMultiplier)
Creates a new DisjunctionMaxQuery
DisjunctionMaxQuery(float tieBreakerMultiplier)
Creates a new empty DisjunctionMaxQuery.

参数float tieBreakerMultiplier：

the score of each non-maximum disjunct for a document is multiplied by this weight and added into the final score. If non-zero, the value should be small, on the order of 0.1。。。

void add(Collection<Query> disjuncts)
Add a collection of disjuncts to this disjunction via Iterable
void add(Query query)
Add a subquery to this disjunction

观察其构造函数以及两个add方法，就能明白大致的用法

FilteredQuery

A query that applies a filter to the results of another query

Filter filter = new DateFilter(FieldDate, DateTime.Parse("2005-10-10"), DateTime.Parse("2005-10-15"));

Query query = QueryParser.Parse("name*", FieldName, analyzer);

query = new FilteredQuery(query, filter);

从代码可以看出FilteredQuery是在已经定义好的query上附加了过滤器；但是过滤器不影响文档的打分，打分的时候，FilterQuery只考虑query的部分，不考虑filter的部分

FilteredQuery还可以进行多条件的过滤

Filter filter = new DateFilter(FieldDate, DateTime.Parse("2005-10-10"), DateTime.Parse("2005-10-15"));
Filter filter2 = new RangeFilter(FieldNumber, NumberTools.LongToString(11L), NumberTools.LongToString(13L), true, true);

Query query = QueryParser.Parse("name*", FieldName, analyzer);
query = new FilteredQuery(query, filter);
query = new FilteredQuery(query, filter2);

IndexSearcher searcher = new IndexSearcher(reader);
Hits hits = searcher.Search(query);

MatchAllDocsQuery

A query that matches all documents.用来匹配所有文档

构造一个没有任何条件的query，用于返回所有的文档