前言
推荐Lucene文章, 也是本人上一篇博客里提到的
这次是系列文章<<Lucene 6.0实战>>,
传送门
结果排序
应业务需求, 要把上一篇文章中的结果按照排序(文字的发布时间)
所以Document要增加一个time字段
对于时间上的比较, 本人倾向于用时间戳, 毕竟文本的表达方式比较起来效率不高
SO, 规定time为long型, 意义为unixTime
然后在search的时候
final static Sort sort = new Sort(new SortField(TIME_COLUMN, SortField.Type.LONG, true)); 在search的时候带上这个sort就会把结果按照时间从大到小排序了
范围查询
这里的范围是指时间范围, 即指定搜索某一时间到某一时间内的数据
由于刚才提到了用long来记录时间, 所以在限定结果范围的时候也要用这个类型
Query query = LongPoint.newRangeQuery("字段", 下限, 上限); 用这个query去search即可
复合查询
每一种查询只能限定一种条件, 那多种查询的组合就要用BooleanQuery来搞定了
把各个查询添加到BooleanQuery中去, 再生成一个Query对象, 然后search
这里需要注意的是, 添加到BooleanQuery的子查询必须也是BooleanQuery, 否则有问题(这个坑我爬了半小时)
BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
queryBuilder.add(......);
TopDocs docs = searcher.search(queryBuilder.build(), count);
测试代码
需求:
在某个时间范围内, 根据关键字搜索出N个数据(含有任一关键字即可), 并把结果按照时间排序
public class LuceneManager {
final static String ID_COLUMN = "id";
final static String ITEM_COLUMN = "item";
final static String TIME_COLUMN = "time";
final static Analyzer analyzer = new JcsegAnalyzer5X(JcsegTaskConfig.SIMPLE_MODE);
final static Sort sort = new Sort(new SortField(TIME_COLUMN, SortField.Type.LONG, true));
Directory dir;
IndexWriter writer;
public LuceneManager(Directory dir) throws IOException {
this.dir = dir;
IndexWriterConfig config = new IndexWriterConfig(analyzer);
writer = new IndexWriter(dir, config);
}
@Override
protected void finalize() throws Throwable {
this.close();
super.finalize();
}
public void close() throws IOException {
if (writer != null) {
writer.close();
writer = null;
}
}
public static List<String> analyse(String str) throws IOException {
List<String> result = new ArrayList<>();
TokenStream ts = analyzer.tokenStream("", str);
ts.reset();
try {
ts.addAttribute(CharTermAttribute.class);
while (ts.incrementToken()) {
CharTermAttribute cta = ts.getAttribute(CharTermAttribute.class);
result.add(new String(cta.buffer(), 0, cta.length()));
}
} finally {
ts.close();
}
return result;
}
public Directory getDirectory() {
return dir;
}
Document buildDocument(long id, String item, long unixTime) {
Document doc = new Document();
doc.add(new StoredField(ID_COLUMN, id));
doc.add(new TextField(ITEM_COLUMN, item, Field.Store.NO));
doc.add(new LongPoint(TIME_COLUMN, unixTime));// 用于查询的属性
doc.add(new NumericDocValuesField(TIME_COLUMN, unixTime));// 用于排序的属性
return doc;
}
Term buildTerm(long id) {
return new Term("id", Long.toString(id));
}
public void append(long id, String item, long unixTime) throws IOException {
writer.addDocument(buildDocument(id, item, unixTime));
}
public void delete(long id) throws IOException {
writer.deleteDocuments(buildTerm(id));
}
public void update(long id, String item, long unixTime) throws IOException {
writer.updateDocument(buildTerm(id), buildDocument(id, item, unixTime));
}
List<Long> buildSearchResult(IndexSearcher searcher, TopDocs topDocs) throws IOException {
List<Long> result = new ArrayList<>();
for (ScoreDoc sd : topDocs.scoreDocs) {
Document doc = searcher.doc(sd.doc);
IndexableField field = doc.getField(ID_COLUMN);
long id = field.numericValue().longValue();
result.add(id);
}
return result;
}
public List<Long> search(String keyWords[], int count, long startTime, long stopTime) throws IOException {
IndexReader reader = DirectoryReader.open(writer);
IndexSearcher searcher = new IndexSearcher(reader);
try {
BooleanQuery.Builder timeLimit = new BooleanQuery.Builder();
timeLimit.add(LongPoint.newRangeQuery(TIME_COLUMN, startTime, stopTime), BooleanClause.Occur.MUST);
BooleanQuery.Builder keysLimit = new BooleanQuery.Builder();
for (String s : keyWords) {
keysLimit.add(new TermQuery(new Term(ITEM_COLUMN, s)), BooleanClause.Occur.SHOULD);
}
// 查询条件:
// if (时间范围内() && (有关键字1() || 有关键字2() || 有关键字3() || ...))
BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();
queryBuilder.add(timeLimit.build(), BooleanClause.Occur.MUST);
queryBuilder.add(keysLimit.build(), BooleanClause.Occur.MUST);
TopDocs docs = searcher.search(queryBuilder.build(), count, sort);
return buildSearchResult(searcher, docs);
} finally {
reader.close();
}
}
public List<Long> search(String keyWords[]) throws IOException {
return search(keyWords, Integer.MAX_VALUE, Long.MIN_VALUE, Long.MAX_VALUE);
}
}
调用代码和上一篇文章差不多, 不再浪费篇幅
结束语
TIME_COLUMN有俩属性, 一个用于限定范围查询, 一个用于排序
对于lucene理解的还很浅显, 若有错误, 请及时指出.
本文介绍如何使用Lucene进行结果排序及复合查询,包括时间戳字段的应用、范围查询实现方法及布尔查询组合技巧。
1191

被折叠的 条评论
为什么被折叠?



