Lucene 中的 自定义排序的实现

本文介绍如何在Lucene中实现自定义排序功能,包括实现SortComparatorSource和ScoreDocComparator接口的具体步骤,并提供了一个搜索附近餐馆的例子。

Lucene 中的 自定义排序的实现 使用Lucene来搜索内容,搜索结果的显示顺序当然是比较重要的.Lucene中Build-in的几个排序定义在大多数情况下是不适合我们使用的.要适合自己的应用程序的场景,就只能自定义排序功能,本节我们就来看看在Lucene中如何实现自定义排序功能.

Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.

<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script>

SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:

public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException

 

Creates a comparator for the field in the given index.


 

 

Parameters:
reader – Index to create comparator for.

fieldname – Field to create comparator for.

Returns:
Comparator of ScoreDoc objects.

Throws:


IOException
- If an error occurs reading the index.

 


 

该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:

public static final ScoreDocComparator RELEVANCE

 

Special comparator for sorting hits according to computed relevance
(document score).

 

public static final ScoreDocComparator INDEXORDER

 

Special comparator for sorting hits according to index order (document
number).

 

有3个方法与排序相关,需要我们实现 分别如下:

public int compare(ScoreDoc i,ScoreDoc j)

 

Compares two ScoreDoc objects and returns a result indicating their sort
order.


 

 

Parameters:
i – First ScoreDoc

j – Second ScoreDoc

Returns:
-1 if i should come before j

1 if i should come after j

0 if they are equal

 


 

public Comparable sortValue(ScoreDoc i)

 

Returns the value used to sort the given document. The object returned
must implement the java.io.Serializable interface. This is used by
multisearchers to determine how to collate results from their searchers.


 

 

Parameters:
i – Document

Returns:
Serializable object

 


 

public int sortType()

 

Returns the type of sort. Should return SortField.SCORE,
SortField.DOC, SortField.STRING,
SortField.INTEGER
, SortField.FLOAT or
SortField.CUSTOM
. It is not valid to return SortField.AUTO.
This is used by multisearchers to determine how to collate results from
their searchers.


 

 

Returns:
One of the constants in SortField.

 


 

 

看个例子吧!


该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.如下图:





Figure 6.1 Which Mexican restaurant is closest to home (at 0,0)
or work (at 10,10)?



 此中情况下 Lucene中Build-in Sorting 实现就不可行了,看看如何自己实现吧.




01 package lia.extsearch.sorting;

02 

03 import org.apache.lucene.search.SortComparatorSource;

04 import org.apache.lucene.search.ScoreDoc;

05 import org.apache.lucene.search.SortField;

06 import org.apache.lucene.search.ScoreDocComparator;

07 import org.apache.lucene.index.IndexReader;

08 import org.apache.lucene.index.TermEnum;

09 import org.apache.lucene.index.Term;

10 import org.apache.lucene.index.TermDocs;

11 

12 import java.io.IOException;

13 // DistanceComparatorSource 实现了SortComparatorSource接口 

14 public class DistanceComparatorSource implements SortComparatorSource {

15   // x y 用来保存 坐标位置

16   private int x;

17   private int y;

18 

19   public DistanceComparatorSource(int x, int y) {

20     this.x = x;

21     this.y = y;

22   }

23     // 返回ScoreDocComparator 用来实现排序功能

24   public ScoreDocComparator newComparator(IndexReader reader, String fieldname)

25       throws IOException {

26     return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);

27   }

28   

29   //DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序

30   private static class DistanceScoreDocLookupComparator implements

31       ScoreDocComparator {

32     private float[] distances;  // 保存每个餐馆到指定点的距离

33 

34     // 构造函数 , 构造函数在这里几乎完成所有的准备工作.

35     public DistanceScoreDocLookupComparator(IndexReader reader,

36         String fieldname, int x, int y) throws IOException {

37 

38       final TermEnum enumerator = reader.terms(new Term(fieldname, ""));

39       distances = new float[reader.maxDoc()];  // 初始化distances

40       if (distances.length > 0) {

41         TermDocs termDocs = reader.termDocs();

42         try {

43           if (enumerator.term() == null) {

44             throw new RuntimeException("no terms in field "

45                 + fieldname);

46           }

47           int i = 0,j = 0;

48           do {

49             System.out.println("in do-while :"  i +);

50             

51             Term term = enumerator.term(); // 取出每一个Term 

52             if (term.field() != fieldname)  // 与给定的域不符合则比较下一个

53               break;

54             //Sets this to the data for the current term in a TermEnum. 

55             //This may be optimized in some implementations.

56             termDocs.seek(enumerator); //参考TermDocs Doc

57             while (termDocs.next()) {

58               System.out.println("   
in while :" 
 j +);

59               System.out.println("   
in while ,Term :" 
+ term.toString());

60               

61               String[] xy = term.text().split(","); // 去处x y

62               int deltax = Integer.parseInt(xy[0]) - x;

63               int deltay = Integer.parseInt(xy[1]) - y;

64               // 计算距离

65               distances[termDocs.doc()] = (float) Math

66                   .sqrt(deltax * deltax + deltay * deltay);

67             }

68           while (enumerator.next());

69         finally {

70           termDocs.close();

71         }

72       }

73     }

74     

75     //有上面的构造函数的准备 这里就比较简单了

76     public int compare(ScoreDoc i, ScoreDoc j) {

77       if (distances[i.doc] < distances[j.doc])

78         return -1;

79       if (distances[i.doc] > distances[j.doc])

80         return 1;

81       return 0;

82     }

83 

84     // 返回距离

85     public Comparable sortValue(ScoreDoc i) {

86       return new Float(distances[i.doc]);

87     }

88 

89     //指定SortType

90     public int sortType() {

91       return SortField.FLOAT;

92     }

93   }

94 

95   public String toString() {

96     return "Distance from ("  x  ","  y  ")";

97   }

98 

99 }


这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的.
该实现能否正确实现,我们来看看测试代码能否通过吧.




001 package lia.extsearch.sorting;

002 

003 import junit.framework.TestCase;

004 import org.apache.lucene.analysis.WhitespaceAnalyzer;

005 import org.apache.lucene.document.Document;

006 import org.apache.lucene.document.Field;

007 import org.apache.lucene.index.IndexWriter;

008 import org.apache.lucene.index.Term;

009 import org.apache.lucene.search.FieldDoc;

010 import org.apache.lucene.search.Hits;

011 import org.apache.lucene.search.IndexSearcher;

012 import org.apache.lucene.search.Query;

013 import org.apache.lucene.search.ScoreDoc;

014 import org.apache.lucene.search.Sort;

015 import org.apache.lucene.search.SortField;

016 import org.apache.lucene.search.TermQuery;

017 import org.apache.lucene.search.TopFieldDocs;

018 import org.apache.lucene.store.RAMDirectory;

019 

020 import java.io.IOException;

021 

022 import lia.extsearch.sorting.DistanceComparatorSource;

023 // 测试 自定义排序的实现

024 public class DistanceSortingTest extends TestCase {

025   private RAMDirectory directory;

026 

027   private IndexSearcher searcher;

028 

029   private Query query;

030 

031   //建立测试环境

032   protected void setUp() throws Exception {

033     directory = new RAMDirectory();

034     IndexWriter writer = new IndexWriter(directory,

035         new WhitespaceAnalyzer(), true);

036     addPoint(writer, "El Charro""restaurant"12);

037     addPoint(writer, "Cafe Poca Cosa""restaurant"59);

038     addPoint(writer, "Los Betos""restaurant"96);

039     addPoint(writer, "Nico's Taco Shop""restaurant"38);

040 

041     writer.close();

042 

043     searcher = new IndexSearcher(directory);

044 

045     query = new TermQuery(new Term("type""restaurant"));

046   }

047 

048   private void addPoint(IndexWriter writer, String name, String type, int x,

049       int y) throws IOException {

050     Document doc = new Document();

051     doc.add(Field.Keyword("name", name));

052     doc.add(Field.Keyword("type", type));

053     doc.add(Field.Keyword("location", x + "," + y));

054     writer.addDocument(doc);

055   }

056 

057   public void testNearestRestaurantToHome() throws Exception {

058     //使用DistanceComparatorSource来构造一个SortField

059     Sort sort = new Sort(new SortField("location",

060         new DistanceComparatorSource(00)));

061 

062     Hits hits = searcher.search(query, sort);  // 搜索

063 

064     //测试

065     assertEquals("closest""El Charro", hits.doc(0).get("name"));

066     assertEquals("furthest""Los Betos", hits.doc(3).get("name"));

067   }

068 

069   public void testNeareastRestaurantToWork() throws Exception {

070     Sort sort = new Sort(new SortField("location",

071         new DistanceComparatorSource(1010)));  // 工作的坐标 10,10

072 

073     //上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用

074     //TopFieldDocs 可以进一步访问相关信息

075     TopFieldDocs docs = searcher.search(query, null, 3, sort);

076 

077     assertEquals(4, docs.totalHits);

078     assertEquals(3, docs.scoreDocs.length);

079 

080     //取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc

081     FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];

082 

083     assertEquals("(10,10) -> (9,6) = sqrt(17)"new Float(Math.sqrt(17)),

084         fieldDoc.fields[0]);

085 

086     Document document = searcher.doc(fieldDoc.doc);

087     assertEquals("Los Betos", document.get("name"));

088 

089      dumpDocs(sort, docs);  // 显示相关信息

090   }

091 

092   // 显示有关排序的信息

093   private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException {

094     System.out.println("Sorted by: " + sort);

095     ScoreDoc[] scoreDocs = docs.scoreDocs;

096     for (int i = 0; i < scoreDocs.length; i++) {

097       FieldDoc fieldDoc = (FieldDoc) scoreDocs[i];

098       Float distance = (Float) fieldDoc.fields[0];

099       Document doc = searcher.doc(fieldDoc.doc);

100       System.out.println("   "  doc.get( "name"" @ ("

101            doc.get( "location"") -> " + distance);

102     }

103   }

104 }


完全通过测试,


输入信息如下:想进一步了解详细信息的可以研究一下:


in do-while :0

    in while :0

    in while ,Term :location:1,2

in do-while :1

    in while :1

    in while ,Term :location:3,8

in do-while :2

    in while :2

    in while ,Term :location:5,9

in do-while :3

    in while :3

    in while ,Term :location:9,6

in do-while :4

in do-while :0

    in while :0

    in while ,Term :location:1,2

in do-while :1

    in while :1

    in while ,Term :location:3,8

in do-while :2

    in while :2

    in while ,Term :location:5,9

in do-while :3

    in while :3

    in while ,Term :location:9,6

in do-while :4

Sorted by:


如果要想取得测试的详细参考信息可以参考testNeareastRestaurantToWork
方法的实现.
 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值