对于lucene来说,索引目录下有多个索引段,那么对应的每个索引段有相对应的reader.
这些reader它们之间是完全独立的,数据是独立的,搜索是独立的。
现在看一下按某个字段排序的收集器是如何做的
private static class OneComparatorScoringMaxScoreCollector extends
OneComparatorNonScoringCollector {
Scorer scorer;
public OneComparatorScoringMaxScoreCollector(FieldValueHitQueue<Entry> queue,
int numHits, boolean fillFields) throws IOException {
super(queue, numHits, fillFields);
// Must set maxScore to NEG_INF, or otherwise Math.max always returns NaN.
maxScore = Float.NEGATIVE_INFINITY;
}
final void updateBottom(int doc, float score) {
bottom.doc = docBase + doc;
bottom.score = score;
bottom = pq.updateTop();
}
@Override
public void collect(int doc) throws IOException {
final float score = scorer.score();
if (score > maxScore) {
maxScore = score;
}
++totalHits;
if (queueFull) {
if ((reverseMul * comparator.compareBottom(doc)) <= 0) {
// since docs are visited in doc Id order, if compare is 0, it means
// this document is largest than anything else in the queue, and
// therefore not competitive.
return;
}
// This hit is competitive - replace bottom element in queue & adjustTop
comparator.copy(bottom.slot, doc);
updateBottom(doc, score);
comparator.setBottom(bottom.slot);
} else {
// Startup transient: queue hasn't gathered numHits yet
final int slot = totalHits - 1;
// Copy hit into queue
comparator.copy(slot, doc);
add(slot, doc, score);
if (queueFull) {
comparator.setBottom(bottom.slot);
}
}
}
@Override
public void setScorer(Scorer scorer) throws IOException {
this.scorer = scorer;
super.setScorer(scorer);
}
}
进入这个优先级队列的时候,当队列未满的时候可以直接拷贝在values数组中,并设置好一个bottom,当然队列满的时候只须跟bottom比较就可以了
小于bottom的直接不要。否则放入队列,并修改bottom的值。
我们先暂时使用数字字段的比较器IntComparator
IntComparator(int numHits, String field, FieldCache.Parser parser, Integer missingValue) {
super(field, missingValue);
values = new int[numHits];
this.parser = (IntParser) parser;
}
生成一个长度为numHits的数组values,保存放入堆里的值,优先级队列比较的时候是使用values来比较。但命中的doc可以放入优先级队列的时候,需要得到当前doc对应的字段的值。。如何获取呢:
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
// NOTE: must do this before calling super otherwise
// we compute the docsWithField Bits twice!
currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, parser, missingValue != null);
super.setNextReader(reader, docBase);
}
这里可以从缓存中得到该reader下某字段的所有值放在currentReaderValues数组中。。
currentReaderValues[doc]直接得到该doc对应字段的值,并放在value数组中:
public void copy(int slot, int doc) {
int v2 = currentReaderValues[doc];
// Test for v2 == 0 to save Bits.get method call for
// the common case (doc has value and value is non-zero):
if (docsWithField != null && v2 == 0 && !docsWithField.get(doc)) {
v2 = missingValue;
}
values[slot] = v2;
}