1.梳理druid预聚合逻辑
摄取预聚合主逻辑:
KafkaRecordSupplier.poll() --> IncrementalPublishingKafkaIndexTaskRunner.getRecords() --> SeekableStreamIndexTaskRunner.getRecords()
--> StreamAppenderatorDriver.add(record) --> BaseAppenderatorDriver.append() --> AppenderatorImpl.add() --> Sink.add() --> IncrementalIndex.add()
(实现类:OnheapIncrementalIndex,OffHeap...,Map...) addToFacts() --> factorizeAggs(),doAggregate()
列值选择逻辑IncrementalIndex.makeColumnSelectorFactory():
final RowBasedColumnSelectorFactory baseSelectorFactory = RowBasedColumnSelectorFactory.create(in::get, null);
final ColumnValueSelector selector = baseSelectorFactory.makeColumnValueSelector(column);
IncrementalIndex
sortFacts:true时才有maxTime,minTime
维度数据结构逻辑参考:
核心分组逻辑方法:IncrementalIndex.toIncrementalIndexRow()
维度字典:dimsDictionary
维度、指标,聚合关联结构:
dims--rowIndex--agg metric--selector
IncrementalIndexRow: 截断timestamp,dims值
聚合后逻辑:
Sink:
canAppendRow()---->index.canAppendRow() maxRowCount,maxBytesInMemory --persist=true/false
FireHydrant:IncrementalIndex
AppenderatorImpl:
add()--->persist=true时,persistAll()-->persistExecutor.submit(): persistHydrant(FireHydrant) ---> IndexMerger.persist(Index)(实现类 IndexMergerV9) ---> merge(new IncrementalIndexAdapter(index),metricAggs,outDir)
druid采集任务字段解析
SeekableStreamIndexTaskRunner.parseWithInputFormat() --->InputEntityReader(实现类JsonReader,Orc,...) JsonReader.parseInputRows()