Flink 源码解析3--ExecutionGraph的生成

最新推荐文章于 2024-03-28 10:57:21 发布

午后的红茶meton

最新推荐文章于 2024-03-28 10:57:21 发布

阅读量635

点赞数 2

分类专栏： Flink学习文章标签： flink ExecutionGraph ExecutionVertex ExecutionEdge

本文链接：https://blog.youkuaiyun.com/u012151684/article/details/108836390

版权

Flink学习专栏收录该内容

25 篇文章

订阅专栏

针对前面程序的transformation操作转化分析过程，能够得到StreamGraph、JobGraph的划分、具体生成过程以及链路形式，这两个执行图的转化生成都是在client本地客户端。而最终flink Job运行时调度层核心的执行图-ExecutionGraph是在服务端(JobManager)中生成的。Client向JobManager提交JobGraph后，JobManage就会根据JobGraph来创建对应的ExecutionGraph，并以最终生成的ExecutionGraph来调度任务。ExecutionGraph在实际处理转化上只是改动了JobGraph的每个节点，而没有对整个拓扑结构进行变动；在JobGraph转换到ExecutionGraph的过程中，主要发生了以下转变：

加入了并行度的概念，成为真正可调度的图结构
生成了与JobVertex对应的ExecutionJobVertex，ExecutionVertex，与IntermediateDataSet对应的IntermediateResult和IntermediateResultPartition等，并行将通过这些类实现

所以针对其代码ExecutionGraphBuilder.buildGraph(JobGraph jobGraph)的具体转化分析如下：

1、创建ExecutionGraph对象并设置基本属性

public class ExecutionGraph implements AccessExecutionGraph {
   /** Job specific information like the job id, job name, job configuration, etc. */
   private final JobInformation jobInformation; // job任务的基本信息
   
   /** All job vertices that are part of this graph. */
   private final ConcurrentHashMap<JobVertexID, ExecutionJobVertex> tasks; // 具体可执行的ExecutionJobVertex任务

   /** All vertices, in the order in which they were created. **/
   private final List<ExecutionJobVertex> verticesInCreationOrder;

   /** All intermediate results that are part of this graph. */
   private final ConcurrentHashMap<IntermediateDataSetID, IntermediateResult> intermediateResults; // 数据集

   /** The currently executed tasks, for callbacks. */
   private final ConcurrentHashMap<ExecutionAttemptID, Execution> currentExecutions;

   /** Listeners that receive messages when the entire job switches it status
    * (such as from RUNNING to FINISHED). */
   private final List<JobStatusListener> jobStatusListeners;

   /** Listeners that receive messages whenever a single task execution changes its status. */
   private final List<ExecutionStatusListener> executionListeners;
   
   /** The slot provider to use for allocating slots for tasks as they are needed. */
   private final SlotProvider slotProvider;
   
   // ------ Fields that are only relevant for archived execution graphs ------------
   private String jsonPlan;
   ......

对象创建即是调用其基本构造函数进行变量的初始化，主要设置JobInformation，SlotProvider以及上述比较重要的数据集、ExecutionJobVertex等容器对象的初始化；

其中生成ExecutionGraph最重要的代码如下：

生成创建初始化executionGraph对象；
将JobVertex在Master上进行初始化；
对所有的Jobvertext进行拓扑排序(所谓拓扑排序，即保证如果存在A->B的有向边，那么在排序后的列表中A 节点一定在B节点之前。这里对应着JobGraph从Source出发排序到结束)；
对排序后的拓扑结构生成ExecutionGraph内部的节点和连接；

public static ExecutionGraph buildGraph(..., JobGraph jobGraph, ...){
    ......
    // 赋值jobId、jobName、JobInformation等
    final ExecutionGraph executionGraph;
    try {
       executionGraph = (prior != null) ? prior :
          new ExecutionGraph(..., ..., ...); // 构造ExecutionGraph对象
    } catch (IOException e) {
       throw new JobException("Could not create the ExecutionGraph.", e);
    }
    ......
    // JobVertex在Master上进行初始化, 主要关注vertex.initializeOnMaster(classLoader);对应的不同类型OutputFormatVertex和InputFormatVertex
    // 其他类型的vertex在这里没有什么特殊操作. File output format在这一步准备好输出目录, Input splits在这一步创建对应的splits
    for (JobVertex vertex : jobGraph.getVertices()) {
       ......
       try {
          vertex.initializeOnMaster(classLoader);
       }
       catch (Throwable t) {
             throw new JobExecutionException(jobId,
                   "Cannot initialize task '" + vertex.getName() + "': " + t.getMessage(), t);
       }
    }
    // 对所有的Jobvertext进行拓扑排序, 从source出发到结束的JobVertex
    // topologically sort the job vertices and attach the graph to the existing one
    List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
    if (log.isDebugEnabled()) {
       log.debug("Adding {} vertices from job graph {} ({}).", sortedTopology.size(), jobName, jobId);
    }
    executionGraph.attachJobGraph(sortedTopology); // 对排序后的拓扑结构生成ExecutionGraph内部的节点和连接
    ......
    // 后面就是一些针对ckp配置的恢复操作配置
}

其中最主要的是：针对排序后的拓扑结构生成ExecutionGraph内部的节点和连接

// 对排序了的拓扑结构进行ExecutionGraph内部的节点和连接
public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
   LOG.debug("Attaching {} topologically sorted vertices to existing job graph with {} " +
         "vertices and {} intermediate results.",
         topologiallySorted.size(), tasks.size(), intermediateResults.size());

   final ArrayList<ExecutionJobVertex> newExecJobVertices = new ArrayList<>(topologiallySorted.size());
   final long createTimestamp = System.currentTimeMillis();

   for (JobVertex jobVertex : topologiallySorted) { // 对已排序的拓扑结构jobVertex, 进行ExecutionJobVertex节点的生成和连接
      if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
         this.isStoppable = false;
      }
      // create the execution job vertex and attach it to the graph
      // 在这里创建生成ExecutionGraph的每个节点
      // 首先是进行了一堆赋值，将任务信息交给要生成的图节点，以及设定并行度等
      // 然后是创建本节点的IntermediateResult，根据本节点的下游节点的个数确定创建几份
      // 最后是根据设定好的并行度创建用于执行task的ExecutionVertex
      // 如果job有设定inputsplit的话，这里还要指定inputsplits
      ExecutionJobVertex ejv = new ExecutionJobVertex(  
         this,
         jobVertex,
         1,
         rpcTimeout,
         globalModVersion,
         createTimestamp);

      // 这里要处理所有的JobEdge
      // 对每个edge，获取对应的intermediateResult，并记录到本节点的输入上
      // 最后，把每个ExecutorVertex和对应的IntermediateResult关联起来
      ejv.connectToPredecessors(this.intermediateResults);

      ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
      if (previousTask != null) {
         throw new JobException(String.format("Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
               jobVertex.getID(), ejv, previousTask));
      }

      for (IntermediateResult res : ejv.getProducedDataSets()) {
         IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
         if (previousDataSet != null) {
            throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
                  res.getId(), res, previousDataSet));
         }
      }

      this.verticesInCreationOrder.add(ejv);
      this.numVerticesTotal += ejv.getParallelism();
      newExecJobVertices.add(ejv);
   }

   terminationFuture = new CompletableFuture<>();
   failoverStrategy.notifyNewVertices(newExecJobVertices);
}

按照拓扑排序的结果依次遍历为每个JobVertex创建对应的ExecutionJobVertex。在创建ExecutionJobVertex的时候会创建对应的ExecutionVertex，IntermediateResult，ExecutionEdge，IntermediateResultPartition等对象；我们知道，Flink Job是可以指定任务的并行度的，在实际运行时，会有多个并行的任务同时在执行，对应到这里就是ExecutionVertex。ExecutionVertex是并行任务的一个子任务，算子的并行度是多少，那么就会有多少个ExecutionVertex其内部对象具体关联对应情况如下：

每一个JobVertex对应一个ExecutionJobVertex；
每一个ExecutionJobVertex有parallelism个ExecutionVertex
每一个JobVertex可能有n(n>=0)个IntermediateDataSet，在ExecutionJobVertex中，一个IntermediateDataSet对应一个IntermediateResult,，每一个IntermediateResult都有parallelism个生产者produceDataSet，对应parallelism个IntermediateResultPartition；
每一个ExecutionJobVertex都会和前向的IntermediateResult连接，实际上是ExecutionVertex和IntermediateResultPartition建立连接，生成多个ExecutionEdge

其构造初始化结果如下：

public class ExecutionJobVertex implements AccessExecutionJobVertex, Archiveable<ArchivedExecutionJobVertex> {	
    private final ExecutionGraph graph; // 
    private final JobVertex jobVertex; // 对应的jobVertex

    /**
	 * The IDs of all operators contained in this execution job vertex.
	 * <p>The ID's are stored depth-first post-order; for the forking chain below the ID's would be stored as [D, E, B, C, A].
	 *  A - B - D
	 *   \    \
	 *    C    E
	 * This is the same order that operators are stored in the {@code StreamTask}.
	 */
	private final List<OperatorID> operatorIDs;
 
   /**
	 * The alternative IDs of all operators contained in this execution job vertex.
	 * <p>The ID's are in the same order as {@link ExecutionJobVertex#operatorIDs}.
	 */
	private final List<OperatorID> userDefinedOperatorIds;

	// ExecutionVertex对应一个并行的子任务
	private final ExecutionVertex[] taskVertices;
	private final IntermediateResult[] producedDataSets;
	private final List<IntermediateResult> inputs;

	private final int parallelism;
	private final SlotSharingGroup slotSharingGroup;
	private final CoLocationGroup coLocationGroup;
	private final InputSplit[] inputSplits;
	private int maxParallelism;

public ExecutionJobVertex(ExecutionGraph graph, JobVertex jobVertex, int defaultParallelism, ...) throws JobException {
	......
	this.graph = graph;
	this.jobVertex = jobVertex;
 
    // 并行度设置
    // 获取jobVertex中的并行度 并与默认的并行度(1)进行比较; 后续会按照并行度大小 设置初始化ExecutionVertex
	int vertexParallelism = jobVertex.getParallelism(); 
	int numTaskVertices = vertexParallelism > 0 ? vertexParallelism : defaultParallelism;
   ......
	this.parallelism = numTaskVertices;
	this.serializedTaskInformation = null;

	this.taskVertices = new ExecutionVertex[numTaskVertices]; // parallelism个ExecutionVertex
	this.operatorIDs = Collections.unmodifiableList(jobVertex.getOperatorIDs());
	this.userDefinedOperatorIds = Collections.unmodifiableList(jobVertex.getUserDefinedOperatorIDs());

	this.inputs = new ArrayList<>(jobVertex.getInputs().size());

	// take the sharing group
	this.slotSharingGroup = jobVertex.getSlotSharingGroup();
	this.coLocationGroup = jobVertex.getCoLocationGroup();

	// create the intermediate results // 每一个JobVertex可能有n(n>=0)个IntermediateDataSet --> n个IntermediateResult
	this.producedDataSets = new IntermediateResult[jobVertex.getNumberOfProducedIntermediateDataSets()];

	for (int i = 0; i < jobVertex.getProducedDataSets().size(); i++) {
		final IntermediateDataSet result = jobVertex.getProducedDataSets().get(i);
		this.producedDataSets[i] = new IntermediateResult( // 构造IntermediateResult中，共有numTaskVertices并行度个IntermediateResultPartition
				result.getId(),
				this,
				numTaskVertices,
				result.getResultType());
	}
   ......
	// create all task vertices
	for (int i = 0; i < numTaskVertices; i++) {
		ExecutionVertex vertex = new ExecutionVertex( // 每一个ExecutionJobVertex有parallelism个ExecutionVertex；ExecutionVertex是并行任务的一个子任务
				this,
				i,
				producedDataSets,
				timeout,
				initialGlobalModVersion,
				createTimestamp,
				maxPriorAttemptsHistoryLength);
		this.taskVertices[i] = vertex;
	}
   ......

在JobGraph中用IntermediateDataSet表示JobVertex的对外输出，一个JobGraph可能有n(n >=0)个IntermediateDataSet输出。在ExecutionGraph中，与此对应的就是IntermediateResult。在IntermediateResult中又因为每一个IntermediateResult都有parallelism个生产者produceDataSet，对应parallelism个IntermediateResultPartition；

public class IntermediateResult {
	private final IntermediateDataSetID id; // 对应的IntermediateDataSet的ID
 
	private final ExecutionJobVertex producer; // 生产者
	private final IntermediateResultPartition[] partitions; // 对应ExecutionJobVertex的并行度
	private final IntermediateResultPartition[] partitions = new IntermediateResultPartition[numParallelProducers];
	private final ResultPartitionType resultType;
}

在每一个ExecutionVertex中都会有其对应的上游ExecutionEdge和下游IntermediateResultPartition

public class ExecutionVertex implements AccessExecutionVertex, Archiveable<ArchivedExecutionVertex> {
	// --------------------------------------------------------------------------------------------
	private final ExecutionJobVertex jobVertex;
	private final Map<IntermediateResultPartitionID, IntermediateResultPartition> resultPartitions;
	private final ExecutionEdge[][] inputEdges;

	private final int subTaskIndex;
	private final EvictingBoundedList<ArchivedExecution> priorExecutions;
	private final Time timeout;

	/** The name in the format "myTask (2/7)", cached to avoid frequent string concatenations. */
	private final String taskNameWithSubtask;
	private volatile CoLocationConstraint locationConstraint;
 
	/** The current or latest execution attempt of this vertex's task. */
	private volatile Execution currentExecution;	// this field must never be null
}

Execution是对ExecutionVertex的一次执行，通过ExecutionAttemptId来唯一标识

public class Execution implements AccessExecution, Archiveable<ArchivedExecution>, LogicalSlot.Payload {
	/** The executor which is used to execute futures. */
	private final Executor executor;
	/** The execution vertex whose task this execution executes. */
	private final ExecutionVertex vertex;
	/** The unique ID marking the specific execution instant of the task. */
	private final ExecutionAttemptID attemptId;
}

由于ExecutionJobVertex有numParallelProducers个并行的子任务，自然对应的每一个IntermediateResult 就有numParallelProducers个生产者，每个生产者的在相应的IntermediateResult上的输出对应一个IntermediateResultPartition。IntermediateResultPartition表示的是ExecutionVertex的一个输出分区，即：

ExecutionJobVertex ------>  IntermediateResult
ExecutionVertex ------>  IntermediateResultPartition

一个ExecutionJobVertex可能包含多个(n)个IntermediateResult，那实际上每一个并行的子任务ExecutionVertex可能会包含(n)个IntermediateResultPartition。IntermediateResultPartition的生产者是ExecutionVertex，消费者是一个或若干个ExecutionEdge。

ExecutionEdge表示ExecutionVertex的输入，通过ExecutionEdge将ExecutionVertex和IntermediateResultPartition连接起来，进而在不同的ExecutionVertex之间建立联系。

public class ExecutionEdge {
	private final IntermediateResultPartition source;
	private final ExecutionVertex target;
	private final int inputNum;
}

最终到目前为止，我们了解了StreamGraph，JobGraph和ExecutionGraph的生成过程，以及他们内部的节点和连接的对应关系。总的来说，streamGraph是最原始的，更贴近用户逻辑的DAG执行图；JobGraph是对StreamGraph的进一步优化，将能够合并的算子合并为一个节点以降低运行时数据传输的开销；ExecutionGraph则是作业运行时用来调度的执行图，可以看作是并行化版本的JobGraph，将DAG拆分到基本的调度单元。

最终可以得到从StreamGraph到JobGraph再到ExecutionGraph的流程转换图；

StreamGraph：

JobGraph：

ExecutionGraph：

JsonPlan：
{
   "jid": "5ec92c59c91b7b9ac75dd86d16188946",
   "name": "word count run",
   "nodes": [
       {
           "id": "90bea66de1c231edf33913ecd54406c1",
           "parallelism": 1,
           "operator": "",
           "operator_strategy": "",
           "description": "Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$2, PassThroughWindowFunction) -> Filter -> Sink: Print to Std. Out",
           "inputs": [
               {
                   "num": 0,
                   "id": "cbc357ccb763df2852fee8c4fc7d55f2",
                   "ship_strategy": "HASH",
                   "exchange": "pipelined_bounded"
               }
           ],
           "optimizer_properties": {}
       },
       {
           "id": "cbc357ccb763df2852fee8c4fc7d55f2",
           "parallelism": 1,
           "operator": "",
           "operator_strategy": "",
           "description": "Source: Custom Source -> Map",
           "optimizer_properties": {}
       }
   ]
}