开始
flink写iceberg时,IcebergStreamWriter的open()方法中,会调用TaskWriterFactory.create(),会创建四种类型的写(UnpartitionedDeltaWriter/UnpartitionedWriter/PartitionedDeltaWriter/RowDataPartitionedFanoutWriter),本文主要追踪这四种类型的写。
其中,IcebergStreamWriter.open()方法:
public void open() {
this.subTaskId = this.getRuntimeContext().getIndexOfThisSubtask();
this.attemptId = this.getRuntimeContext().getAttemptNumber();
this.taskWriterFactory.initialize(this.subTaskId, this.attemptId);
this.writer = this.taskWriterFactory.create();
}
TaskWriterFactory.create()方法:
public TaskWriter<RowData> create() {
Preconditions.checkNotNull(this.outputFileFactory, "The outputFileFactory shouldn't be null if we have invoked the initialize().");
if (this.equalityFieldIds != null && !this.equalityFieldIds.isEmpty()) {
return (TaskWriter)(this.spec.isUnpartitioned() ? new UnpartitionedDeltaWriter(this.spec, this.format, this.appenderFactory, this.outputFileFactory, this.io, this.targetFileSizeBytes, this.schema, this.flinkSchema, this.equalityFieldIds) : new PartitionedDeltaWriter(this.spec, this.format, this.appenderFactory, this.outputFileFactory, this.io, this.targetFileSizeBytes, this.schema, this.flinkSchema, this.equalityFieldIds));
} else {
return (TaskWriter)(this.spec.isUnpartitioned() ? new UnpartitionedWriter(this.spec, this.format, this.appenderFactory, this.outputFileFactory, this.io, this.targetFileSizeBytes) : new RowDataTaskWriterFactory.RowDataPartitionedFanoutWriter(this.spec, this.format, this.appenderFactory, this.outputFileFactory, this.io, this.targetFileSizeBytes, this.schema, this.flinkSchema));
}
}
继承关系
此方法中根据是否指定字段,构造分区写(PartitionedDeltaWriter/RowDataPartitionedFanoutWriter)和非分区写实例(UnpartitionedDeltaWriter/UnpartitionedWriter)
关系图
从图中可以看出,几种类型的写均继承自BaseTaskWriter抽象类。区别在于 Partitioned方式的写需要处理一些分区 Key 生成的逻辑。
其中:
- TaskWriter/BaseTaskWriter/UnpartitionedWriter/PartitionedWriter/RowDataPartitionedFanoutWriter均在org.apache.iceberg.io 这个包,这里面的类或接口都是在 iceberg-core 模块中,这里面定义了 Iceberg 写数据的公共逻辑;
- PartitionedDeltaWriter和UnpartitionedDeltaWriter是在org.apache.iceberg.flink.sink包中,flink connector模块实现的。
以上类均实现TaskWriter接口:
public interface TaskWriter<T> extends Closeable {
void write(T var1) throws IOException;
void abort() throws IOException;
default DataFile[] dataFiles() throws IOException {
WriteResult result = this.complete();
Preconditions.checkArgument(result.deleteFiles() == null || result.deleteFiles().length == 0, "Should have no delete files in this write result.");
return result.dataFil