lucene 的两种join 方式

使用lucene进行筛选的场景往往不是简单的一个一个document的写入,document之间会有层级关系,比如美团筛选中的商家信息和deal信息,一个商家下面挂有多个deal的信息。进行筛选和排序的时候需要考虑商家的条件和deal的条件,比如通过价格排序就需要从deal纬度去考虑。

lucene提供了两种方式完成这种场景的筛选。

Index-time joins

The index-time joining support joins while searching, where joined documents are indexed as a single document block using IndexWriter.addDocuments(). This is useful for any normalized content (XML documents or database tables). In database terms, all rows for all joined tables matching a single row of the primary table must be indexed as a single document block, with the parent document being last in the group.

When you index in this way, the documents in your index are divided into parent documents (the last document of each block) and child documents (all others). You provide a Filter that identifies the parent documents, as Lucene does not currently record any information about doc blocks.

At search time, use ToParentBlockJoinQuery to remap/join matches from any child Query (ie, a query that matches only child documents) up to the parent document space. The resulting query can then be used as a clause in any query that matches parent.

If you only care about the parent documents matching the query, you can use any collector to collect the parent hits, but if you'd also like to see which child documents match for each parent document, use the ToParentBlockJoinCollector to collect the hits. Once the search is done, you retrieve a TopGroups instance from the ToParentBlockJoinCollector.getTopGroups() method.

To map/join in the opposite direction, use ToChildBlockJoinQuery. This wraps any query matching parent documents, creating the joined query matching only child documents.

Query-time joins

The query time joining is index term based and implemented as two pass search. The first pass collects all the terms from a fromField that match the fromQuery. The second pass returns all documents that have matching terms in a toField to the terms collected in the first pass.

Query time joining has the following input:

  • fromField: The from field to join from.
  • fromQuery: The query executed to collect the from terms. This is usually the user specified query.
  • multipleValuesPerDocument: Whether the fromField contains more than one value per document
  • scoreMode: Defines how scores are translated to the other join side. If you don't care about scoring use ScoreMode.None mode. This will disable scoring and is therefore more efficient (requires less memory and is faster).
  • toField: The to field to join to

Basically the query-time joining is accessible from one static method. The user of this method supplies the method with the described input and a IndexSearcher where the from terms need to be collected from. The returned query can be executed with the same IndexSearcher, but also with another IndexSearcher. Example usage of the JoinUtil.createJoinQuery() :


String fromField = "from"; // Name of the from field
  boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
  String toField = "to"; // Name of the to field
  ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
  Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values

  Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
  TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher


query time join 相当于两次search,第一次通过fromquery查找出符合条件的doc,然后通过fromfield把值映射到tofield,产生一个joinquery,然后利用该joinquery再进行查找。

query time join需要查询两次,且返回的结果没有上一次搜索的结果,更不用说排序功能。

index time join则通过在写索引的时候对索引按照block处理,该方法更加灵活,能够实现query time join不能提供的功能,但是该方式可能写入更多的索引。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值