下面给出 可直接拷贝运行 的 Lucene 8.5.0 示例,演示 TwoPhaseIterator 两阶段验证 的完整流程。
场景:使用 `PhraseQuery`(短语查询),它天然携带 `TwoPhaseIterator`,第一阶段通过倒排表拿到“候选文档”,第二阶段验证 词间距与顺序。
---
1. 依赖(Maven)
```xml
<dependencies>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>8.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>8.5.0</version>
</dependency>
</dependencies>
```
---
2. 完整代码
```java
package demo;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
import org.apache.lucene.store.ByteBuffersDirectory;
import org.apache.lucene.util.Bits;
public class TwoPhaseDemo {
public static void main(String[] args) throws Exception {
// 1. 建立索引
ByteBuffersDirectory dir = new ByteBuffersDirectory();
IndexWriterConfig cfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter w = new IndexWriter(dir, cfg);
// 文档 1:短语完全匹配
Document doc1 = new Document();
doc1.add(new TextField("body", "lucene is a powerful search engine", Field.Store.YES));
w.addDocument(doc1);
// 文档 2:包含单词但顺序颠倒,短语不匹配
Document doc2 = new Document();
doc2.add(new TextField("body", "search lucene engine is powerful", Field.Store.YES));
w.addDocument(doc2);
w.commit();
w.close();
// 2. 打开读取器
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
// 3. 构建 PhraseQuery(短语查询)
PhraseQuery query = new PhraseQuery.Builder()
.add(new Term("body", "lucene"))
.add(new Term("body", "search"))
.setSlop(1) // 允许 1 个词间距
.build();
// 4. 获取 Weight 与 Scorer
Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE, 1.0f);
LeafReaderContext ctx = reader.leaves().get(0);
Scorer scorer = weight.scorer(ctx);
// 5. 两阶段遍历
if (scorer == null) {
System.out.println("无匹配文档");
return;
}
TwoPhaseIterator twoPhase = scorer.twoPhaseIterator();
if (twoPhase != null) {
DocIdSetIterator approx = twoPhase.approximation();
int doc;
while ((doc = approx.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
// 第一阶段:倒排表快速筛选候选
System.out.println("候选文档 doc=" + doc);
// 第二阶段:精确验证短语是否满足
if (twoPhase.matches()) {
Document d = searcher.doc(doc);
System.out.println("✅ 通过二次验证 -> " + d.get("body"));
} else {
System.out.println("❌ 未通过二次验证");
}
}
}
reader.close();
dir.close();
}
}
```
---
3. 运行结果(示例)
```
候选文档 doc=0
✅ 通过二次验证 -> lucene is a powerful search engine
候选文档 doc=1
❌ 未通过二次验证
```
---
4. 关键点
- 第一阶段:`approximation().nextDoc()` 只检查两词是否都出现过(倒排表)。
- 第二阶段:`twoPhase.matches()` 才检查 顺序 + 间距 是否满足短语规则。
- TwoPhaseIterator 正是 Lucene 实现“先粗筛后精筛”的核心机制。