java+es+nested_Elasticsearch中的关联查询。Nested类型介绍及查询原理。。-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_31887307/article/details/114612998

本文介绍了Elasticsearch中的nested类型及其在关联查询中的应用。nested类型解决了object类型在父子关联上的不足，能有效处理复杂的数据结构。文章详细讲解了query-time join和index-time join的概念，以及Lucene中的JoinUtil.createJoinQuery方法。同时，讨论了nested类型的查询和聚合操作，包括NestedQueryBuilder和NestedAggregator的工作原理，并提到了Elasticsearch中的join实现方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

nested初解

了解这个nested之前呢，需要先了解两个基本概念---索引时关联和查询时关联，另外还有es的两种类型，object类型和join。

Lucene中的关联

1. 查询时关联 Query-time join

简介

最直接的做法，类似于传统关系型数据库中用于处理多表连接的一种做法，为两张"表"建立一个外键关联，以此来解决关联问题，下面是Lucene对外提供的接口工具方法。当然不止这一个，还有其他的，这里我就不多列举了。

public final class JoinUtil {

/**

* A query time join using global ordinals over a dedicated join field.

* This join has certain restrictions and requirements:

* 1) A document can only refer to one other document. (but can be referred by one or more documents)

* 2) Documents on each side of the join must be distinguishable. Typically this can be done by adding an extra field

* that identifies the "from" and "to" side and then the fromQuery and toQuery must take the this into account.

* 3) There must be a single sorted doc values join field used by both the "from" and "to" documents. This join field

* should store the join values as UTF-8 strings.

* 4) An ordinal map must be provided that is created on top of the join field.

* Note: min and max filtering and the avg score mode will require this join to keep track of the number of times

* a document matches per join value. This will increase the per join cost in terms of execution time and memory.

* @param joinField The {@link SortedDocValues} field containing the join values

* @param fromQuery The query containing the actual user query. Also the fromQuery can only match "from" documents.

* @param toQuery The query identifying all documents on the "to" side.

* @param searcher The index searcher used to execute the from query

* @param scoreMode Instructs how scores from the fromQuery are mapped to the returned query

* @param ordinalMap The ordinal map constructed over the joinField. In case of a single segment index, no ordinal map

* needs to be provided.

* @param min Optionally the minimum number of "from" documents that are required to match for a "to" document

* to be a match. The min is inclusive. Setting min to 0 and max to Interger.MAX_VALUE

* disables the min and max "from" documents filtering

* @param max Optionally the maximum number of "from" documents that are allowed to match for a "to" document

* to be a match. The max is inclusive. Setting min to 0 and max to Interger.MAX_VALUE

* disables the min and max "from" documents filtering

* @return a {@link Query} instance that can be used to join documents based on the join field

* @throws IOException If I/O related errors occur

public static Query createJoinQuery(String joinField,

Query fromQuery,

Query toQuery,

IndexSearcher searcher,

ScoreMode scoreMode,

OrdinalMap ordinalMap,

int min,

int max

}

查询过程

先执行fromQuery查询子文档，获得一个collector

然后再以这个collector的结果作为条件查询toQuery结果

总结

query-time join由于查询了两遍，性能会下降。

2. 索引时关联 Index-time join

简介

索引时关联，顾名思义，就是在建立索引时对两种文档进行关联。

关联的思路

lucene中的文档都是有顺序的，那么考虑一种最方便的做法，我们把关联的文档以一定的顺序写入，那么就能很快的找到关联的文档。比如目前的做法就是先索引子文档，再索引主文档那么在索引中，数据是这样存储的。

有以下三个doc，A,B,C，他们有子文档1,2......

A - 1,2,3

B - 4

C - 5,6

在索引中:1,2,3,A,4,B,5,6,C

当我们要找子文档是1的父文档时，只要找到1的位置，然后一直遍历，直到不是子文档的时候就会找到他的父文档。

Lucene用法

public class ToParentBlockJoinQuery extends Query {

/** Create a ToParentBlockJoinQuery.

* @param childQuery Query matching child documents.

* @param parentsFilter Filter identifying the parent documents.

* @param scoreMode How to aggregate multiple child scores

* into a single parent score.

**/

public ToParentBlockJoinQuery(Query childQuery, BitSetProducer parentsFilter, ScoreMode scoreMode)

}

上述Query就是lucene中用来进行查询关联的Query类，第一个Query是子文档的查询条件。

总结

这种方式比query time index要快一些，大概30%，目前更建议在合适的情况下选择两种不同的关联用法。

object

在Elasticsearch中，object对象其实是被当做多列数据来处理的。比如：

{

"group" : "fans",

"user" : [

{

"first" : "John",

"last" : "Smith"

{

"first" : "Alice",

"last" : "White"

}

]

}

这样一个JSON，如果将user定义为object类型，那么它会变为这样一个索引，与以下json生成同样的索引

{

"group" : "fans",

"user.first" : ["John","Alice"],

"user.last" : ["Smith","White"]

}

很明显，在搜索的时候，如果我们使用一个条件，同时满足firstname是"Alice"和lastname是"Smith"的时候，这条结果也会被返回。

因此，object不能用来作为父子关系的文档来进行索引。

nested

nested就是为了解决上述问题而制造的。nested实际上在索引中会创建一个父文档以及多个子文档(比如上述事例，数量取决于user数组的大小)。

nested搜索原理

查询

主要的查询是在NestedQueryBuilder.java这个类中，这里会构建一个ESToParentBlockJoinQuery对象，这个对象中实际上封装了一个ToParentBlockJoinQuery。ToParentBlockJoinQuery是Lucene中的一种查询，主要用于索引时关联的查询使用。

protected Query doToQuery(QueryShardContext context) throws IOException {

ObjectMapper nestedObjectMapper = context.getObjectMapper(path);

if (nestedObjectMapper == null) {

if (ignoreUnmapped) {

return new MatchNoDocsQuery();

} else {

throw new IllegalStateException("[" + NAME + "] failed to find nested object under path [" + path + "]");

}

if (!nestedObjectMapper.nested().isNested()) {

throw new IllegalStateException("[" + NAME + "] nested object under path [" + path + "] is not of nested type");

}

final BitSetProducer parentFilter;

Query innerQuery;

ObjectMapper objectMapper = context.nestedScope().getObjectMapper();

if (objectMapper == null) {

parentFilter = context.bitsetFilter(Queries.newNonNestedFilter(context.indexVersionCreated()));

} else {

parentFilter = context.bitsetFilter(objectMapper.nestedTypeFilter());

}

try {

context.nestedScope().nextLevel(nestedObjectMapper);

innerQuery = this.query.toQuery(context);

} finally {

context.nestedScope().previousLevel();

}

// ToParentBlockJoinQuery requires that the inner query only matches documents

// in its child space

if (new NestedHelper(context.getMapperService()).mightMatchNonNestedDocs(innerQuery, path)) {

innerQuery = Queries.filtered(innerQuery, nestedObjectMapper.nestedTypeFilter());

}

return new ESToParentBlockJoinQuery(innerQuery, parentFilter, scoreMode,

objectMapper == null ? null : objectMapper.fullPath());

}

聚合

Nested 实际的查询时一个聚合NestedAggregator，主要实现在NestedAggregator.java这个类中：

public LeafBucketCollector getLeafCollector(final LeafReaderContext ctx, final LeafBucketCollector sub) throws IOException {

IndexReaderContext topLevelContext = ReaderUtil.getTopLevelContext(ctx);

IndexSearcher searcher = new IndexSearcher(topLevelContext);

searcher.setQueryCache(null);

Weight weight = searcher.createWeight(searcher.rewrite(childFilter), ScoreMode.COMPLETE_NO_SCORES, 1f);

Scorer childDocsScorer = weight.scorer(ctx);

final BitSet parentDocs = parentFilter.getBitSet(ctx);

final DocIdSetIterator childDocs = childDocsScorer != null ? childDocsScorer.iterator() : null;

if (collectsFromSingleBucket) {

return new LeafBucketCollectorBase(sub, null) {

@Override

public void collect(int parentDoc, long bucket) throws IOException {

// if parentDoc is 0 then this means that this parent doesn't have child docs (b/c these appear always before the parent

// doc), so we can skip:

if (parentDoc == 0 || parentDocs == null || childDocs == null) {

return;

}

final int prevParentDoc = parentDocs.prevSetBit(parentDoc - 1);

int childDocId = childDocs.docID();

if (childDocId <= prevParentDoc) {

childDocId = childDocs.advance(prevParentDoc + 1);

}

for (; childDocId < parentDoc; childDocId = childDocs.nextDoc()) {

collectBucket(sub, childDocId, bucket);

}

};

} else {

return bufferingNestedLeafBucketCollector = new BufferingNestedLeafBucketCollector(sub, parentDocs, childDocs);

}

主要包括几部分：

先拿到父文档的docid集合。

获取父子文档的docid迭代器。

判断子文档是否符合条件，符合条件的数据放到collectBucket。

join

es中的join实际上就是query time join的实现，以一个字段作为关联的主键，然后进行关联查询，具体查询的实现逻辑如下，原理还是JoinUtil.createJoinQuery

public class HasChildQueryBuilder extends AbstractQueryBuilder {

public Query rewrite(IndexReader reader) throws IOException {

Query rewritten = super.rewrite(reader);

if (rewritten != this) {

return rewritten;

}

if (reader instanceof DirectoryReader) {

IndexSearcher indexSearcher = new IndexSearcher(reader);

indexSearcher.setQueryCache(null);

indexSearcher.setSimilarity(similarity);

IndexOrdinalsFieldData indexParentChildFieldData = fieldDataJoin.loadGlobal((DirectoryReader) reader);

OrdinalMap ordinalMap = indexParentChildFieldData.getOrdinalMap();

return JoinUtil.createJoinQuery(joinField, innerQuery, toQuery, indexSearcher, scoreMode,

ordinalMap, minChildren, maxChildren);

} else {

if (reader.leaves().isEmpty() && reader.numDocs() == 0) {

// asserting reader passes down a MultiReader during rewrite which makes this

// blow up since for this query to work we have to have a DirectoryReader otherwise

// we can't load global ordinals - for this to work we simply check if the reader has no leaves

// and rewrite to match nothing

return new MatchNoDocsQuery();

}

throw new IllegalStateException("can't load global ordinals for reader of type: " +

reader.getClass() + " must be a DirectoryReader");

}