首先感谢尚硅谷!!!
注意:小弟拿这个做笔记用
版本 flink-1.12
本篇中涉及的类不少,但把各个类的关系疏离清楚的话,会发现,主要的类就那几个,主要的方法也不是很多。(主要方法指的是其他方法大多在这几个方法中调用,抓住主要方法分析)
本次主要分析的类和方法有
CliFrontend类和里面的main方法和run方法(以run命令分析,其他命令就不看了)
AbstractJobClusterExecutor类的和里面的execute方法
YarnClusterDescriptor类和里面的deployJobCluster和deployInternal方法
先放上来两个方法瞅瞅,分析完这两个方法,本次任务就搞完大半了
CliFrontend类里的
main方法
public static void main(final String[] args) {
EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);
// 1. find the configuration directory
/*TODO 获取flink的conf目录的路径*/
final String configurationDirectory = getConfigurationDirectoryFromEnv();
// 2. load the global configuration
/*TODO 根据conf路径,加载配置*/
final Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory);
// 3. load the custom command lines
/*TODO 封装命令行接口:按顺序Generic、Yarn、Default*/
final List<CustomCommandLine> customCommandLines = loadCustomCommandLines(
configuration,
configurationDirectory);
try {
final CliFrontend cli = new CliFrontend(
configuration,
customCommandLines);
SecurityUtils.install(new SecurityConfiguration(cli.configuration));
int retCode = SecurityUtils.getInstalledContext()
.runSecured(() -> cli.parseAndRun(args));
System.exit(retCode);
}
catch (Throwable t) {
final Throwable strippedThrowable = ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
LOG.error("Fatal error while running command line interface.", strippedThrowable);
strippedThrowable.printStackTrace();
System.exit(31);
}
}
run方法
protected void run(String[] args) throws Exception {
LOG.info("Running 'run' command.");
/*TODO 获取run动作,默认的配置项*/
final Options commandOptions = CliFrontendParser.getRunCommandOptions();
/*TODO 根据用户指定的配置项,进行解析*/
final CommandLine commandLine = getCommandLine(commandOptions, args, true);
// evaluate help flag
if (commandLine.hasOption(HELP_OPTION.getOpt())) {
CliFrontendParser.printHelpForRun(customCommandLines);
return;
}
/*TODO 根据之前添加的顺序,挨个判断是否active:Generic、Yarn、Default*/
final CustomCommandLine activeCommandLine =
validateAndGetActiveCommandLine(checkNotNull(commandLine));
final ProgramOptions programOptions = ProgramOptions.create(commandLine);
/*TODO 获取 用户的jar包和其他依赖*/
final List<URL> jobJars = getJobJarAndDependencies(programOptions);
/*TODO 获取有效配置:HA的id、Target(session、per-job)、JobManager内存、TaskManager内存、每个TM的slot数...*/
final Configuration effectiveConfiguration = getEffectiveConfiguration(
activeCommandLine, commandLine, programOptions, jobJars);
LOG.debug("Effective executor configuration: {}", effectiveConfiguration);
final PackagedProgram program = getPackagedProgram(programOptions, effectiveConfiguration);
try {
/*TODO 执行程序*/
executeProgram(effectiveConfiguration, program);
} finally {
program.deleteExtractedLibraries();
}
}
0.首先得找个头吧
启动命令是这个吧:flink run -t yarn-per-job -c … xxx.jar
打开bin下的flink文件
可以找到
这个类就是我们分析的始点。
打开源码找到这个类
只看主要代码,各种判断null,不可修改包装,安全包装、异常处理等就略过了,调用很多层的话就只拿出最后一层来看
找到main方法
1.得到conf路径
final String configurationDirectory = getConfigurationDirectoryFromEnv(); // 就是下图那个conf
2.根据conf路径,加载配置
final Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory)
调用GlobalConfiguration的loadConfiguration方法,传入conf路径
loadConfiguration方法中
final File yamlConfigFile = new File(confDirFile, FLINK_CONF_FILENAME); // FLINK_CONF_FILENAME = "flink-conf.yaml"
Configuration configuration = loadYAMLResource(yamlConfigFile); // 加载配置,loadYAMLResource方法里内容是:读取文件,循环里config.setString(key, value),最后得到config返回
return configuration;
3.封装命令行接口:按顺序拿到GenericCLI、FlinkYarnSessionCli、DefaultCLI,得到customCommandLines对象
final List<CustomCommandLine> customCommandLines = loadCustomCommandLines(
configuration,
configurationDirectory);
调用loadCustomCommandLines方法,传入上面两部得到的configuration和configurationDirectory
得到一个列表ArrayList,表里元素有三个:GenericCLI、FlinkYarnSessionCli、DefaultCLI。
public static List<CustomCommandLine> loadCustomCommandLines(Configuration configuration, String configurationDirectory) {
List<CustomCommandLine> customCommandLines = new ArrayList<>();
// 第一个
customCommandLines.add(new GenericCLI(configuration, configurationDirectory));
// 第二个
final String flinkYarnSessionCLI = "org.apache.flink.yarn.cli.FlinkYarnSessionCli";
customCommandLines.add(
loadCustomCommandLine(flinkYarnSessionCLI,
configuration,
configurationDirectory,
"y",
"yarn"));
// 第三个
customCommandLines.add(new DefaultCLI());
return customCommandLines;
}
4. 创建CliFrontend对象
final CliFrontend cli = new CliFrontend(
configuration,
customCommandLines);
在这一步中有个需要注意的点:
传入的 customCommandLines会有一个处理:将遍历customCommandLines里的三个元素,将三个元素里的配置Option拿出来放入customCommandLineOptions,而这个customCommandLineOptions会在后面和获取的run的配置合并
this.customCommandLineOptions = new Options();
for (CustomCommandLine customCommandLine : customCommandLines) {
customCommandLine.addGeneralOptions(customCommandLineOptions);
customCommandLine.addRunOptions(customCommandLineOptions);
}
看看拿出来的是什么
例:
5.cli.parseAndRun(args)
SecurityUtils.install(new SecurityConfiguration(cli.configuration));
int retCode = SecurityUtils.getInstalledContext()
.runSecured(() -> cli.parseAndRun(args));
// 看着一堆,其实主要就是执行cli.parseAndRun(args)
6. 解析执行命令
public int parseAndRun (String[]args){
// get action
// 比如说我们的命令是flink run -t yarn-per-job -c .... xxx.jar
// 那么 args[0] 就是 run
String action = args[0];
// remove action from parameters
final String[] params = Arrays.copyOfRange(args, 1, args.length);
// do action
// action是run,走下面的第一个
switch (action) {
case ACTION_RUN: // ACTION_RUN = "run";
run(params); // 走这,传入其他参数。下面其他的几个就不看了
return 0;
case ACTION_RUN_APPLICATION:
runApplication(params);
return 0;
case ACTION_LIST:
list(params);
return 0;
case ACTION_INFO:
info(params);
return 0;
case ACTION_CANCEL:
cancel(params);
return 0;
case ACTION_STOP:
stop(params);
return 0;
case ACTION_SAVEPOINT:
savepoint(params);
return 0;
case "-h":
case "--help":
CliFrontendParser.printHelp(customCommandLines);
return 0;
case "-v":
case "--version":
...
return 0;
default:
...
return 1;
}
}
main方法就看完了,下面是run方法
7.获取run动作默认的选项
final Options commandOptions = CliFrontendParser.getRunCommandOptions();
// 这个是run命令的,其他命令的就是获取其他相应选项
// 例如list就是CliFrontendParser.getListCommandOptions()
CliFrontendParser类是提取命令行选项的简单命令行解析器,由一堆Optin和getRunCommandOptions()、getListCommandOptions()等方法组成,不同的方法添加的选项Optin也不同,Option如下图。
public static Options getRunCommandOptions() {
Options options = buildGeneralOptions(new Options());
options = getProgramSpecificOptions(options);
options.addOption(SAVEPOINT_PATH_OPTION);
return options.addOption(SAVEPOINT_ALLOW_NON_RESTORED_OPTION);
}
加了很多run相关的选项,好多,不看也罢
例:就上面那个SAVEPOINT_ALLOW_NON_RESTORED_OPTION,这个东西就是CliFrontendParser里的
8.解析配置项
final CommandLine commandLine = getCommandLine(commandOptions, args, true);
调用getCommandLine方法,传入上一步的到的run命令的选项和参数args,得到一个CommandLine对象
//合并 Options 对象 ,commandOptions就是传入的commandOptions,而customCommandLineOptions则是上面第4步中实例化CliFrontend时产生的customCommandLineOptions
final Options commandLineOptions = CliFrontendParser.mergeOptions(commandOptions, customCommandLineOptions);
//开始解析,接下来就是层层调用,主要是parse的重载方法比较多
return CliFrontendParser.parse(commandLineOptions, args, stopAtNonOptions);
- CommandLine类里主要就是成员options和对他的各种操作(还有个成员args,无法识别的选项参数,就那些没匹配到Option的参数)
private final List<Option> options = new ArrayList<Option>(); // 解析到匹配的Option就放里面 addOption(option)
一路调用parse()直到DefaultParser类里
parse方法,遍历参数args,最终得到CommandLine对象
cmd = new CommandLine();
for (String argument : arguments)
{
handleToken(argument);
}
return cmd;
handleToken方法反正就是对args的各种解析处理,处理 --开头的、-开头的、带=的 ,遇到能与某个Optin匹配的参数(getMatchingOptions返回的list有且只有一个值)就放到cmd中,还有对值的处理,更深的不看也罢
private void handleToken(String token) throws ParseException
{
currentToken = token;
if (skipParsing)
{
cmd.addArg(token);
}
else if ("--".equals(token))
{
skipParsing = true;
}
else if (currentOption != null && currentOption.acceptsArg() && isArgument(token))
{
currentOption.addValueForProcessing(Util.stripLeadingAndTrailingQuotes(token));
}
else if (token.startsWith("--"))
{
handleLongOption(token);
}
else if (token.startsWith("-") && !"-".equals(token))
{
handleShortAndLongOption(token);
}
else
{
handleUnknownToken(token);
}
if (currentOption != null && !currentOption.acceptsArg())
{
currentOption = null;
}
}
9.获取们使用的哪种命令行客户端(以哪种模式启动的)
final CustomCommandLine activeCommandLine = validateAndGetActiveCommandLine(checkNotNull(commandLine));
每个CustomCommandLine的子类里会实现boolean isActive(CommandLine commandLine);方法,通过这个可以判断是否活跃,即我们使用的哪种命令行客户端(CustomCommandLine)
前面第3步得到的customCommandLines包含的三个命令行客户端是按顺序添加的,在这一步挨个判断,最后一个DefaultCLI的isActive一定为true。
GenericCLI的isActive
public boolean isActive(CommandLine commandLine) {
return configuration.getOptional(DeploymentOptions.TARGET).isPresent()
|| commandLine.hasOption(executorOption.getOpt()) // -e 有没有 已弃用
|| commandLine.hasOption(targetOption.getOpt()); // -t 有没有
}
FlinkYarnSessionCli的isActive
public boolean isActive(CommandLine commandLine) {
if (!super.isActive(commandLine)) { // 看他的父类
return (isYarnPropertiesFileMode(commandLine) && yarnApplicationIdFromYarnProperties != null);
}
return true;
}
他的抽象父类的isActive
public boolean isActive(CommandLine commandLine) {
final String jobManagerOption = commandLine.getOptionValue(addressOption.getOpt(), null);
/*TODO ID是固定的字符串 "yarn-cluster"*/
final boolean yarnJobManager = ID.equals(jobManagerOption);
/*TODO 判断是否存在 Yarn Session对应的 AppID*/
final boolean hasYarnAppId = commandLine.hasOption(applicationId.getOpt())
|| configuration.getOptional(YarnConfigOptions.APPLICATION_ID).isPresent();
final boolean hasYarnExecutor = YarnSessionClusterExecutor.NAME.equalsIgnoreCase(configuration.get(DeploymentOptions.TARGET))
|| YarnJobClusterExecutor.NAME.equalsIgnoreCase(configuration.get(DeploymentOptions.TARGET));
/*TODO -m yarn-cluster || yarn有appID,或者命令行指定了 || 执行器是yarn的*/
return hasYarnExecutor || yarnJobManager || hasYarnAppId;
}
//YarnSessionClusterExecutor.NAME即"yarn-session"
//YarnJobClusterExecutor.NAME即"yarn-per-job"
DefaultCLI的isActive 直接返回true
public boolean isActive(CommandLine commandLine) {
// always active because we can try to read a JobManager address from the config
return true;
}
10.封装一下创建ProgramOptions
final ProgramOptions programOptions = ProgramOptions.create(commandLine);
11.获取 用户的jar包和其他依赖
final List<URL> jobJars = getJobJarAndDependencies(programOptions);
12.获取有效配置:HA的id、Target(session、per-job)、JobManager内存、TaskManager内存、每个TM的slot数…
final Configuration effectiveConfiguration = getEffectiveConfiguration(activeCommandLine, commandLine, programOptions, jobJars);
getEffectiveConfiguration方法
final Configuration commandLineConfiguration = checkNotNull(activeCustomCommandLine).toConfiguration(commandLine); // 这是个接口,我们看AbstractCustomCommandLine类的toConfiguration方法
// 创建个Configuration对象,然后Configuration的各种set,就是取出commandLine的一些配置项封装为Configuration
13.将programOptions和effectiveConfiguration封装成PackagedProgram
final PackagedProgram program = getPackagedProgram(programOptions, effectiveConfiguration);
14.执行程序
executeProgram(effectiveConfiguration, program);
到ClientUtils类的executeProgram方法
final ClassLoader userCodeClassLoader = program.getUserCodeClassLoader();
final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
/*TODO 配置环境的上下文,用户代码里的 getExecutionEnvironment就会拿到这些环境信息*/
ContextEnvironment.setAsContext(
executorServiceLoader,
configuration,
userCodeClassLoader,
enforceSingleJobExecution,
suppressSysout);
StreamContextEnvironment.setAsContext(
executorServiceLoader,
configuration,
userCodeClassLoader,
enforceSingleJobExecution,
suppressSysout);
program.invokeInteractiveModeForExecution(); // 这个方法再调用callMainMethod(mainClass, args);
callMainMethod方法中
mainMethod = entryClass.getMethod("main", String[].class);
/*TODO 调用用户代码的main方法*/
mainMethod.invoke(null, (Object) args);
15.execute方法 我们写的代码main()最后执行时调用的那个方法
在org.apache.flink.streaming.api.environment包下的StreamExecutionEnvironment类中
//一层层的进
execute(String jobName)
execute(getStreamGraph(jobName)) // 再调用这个 获取StreamGraph,并接着执行
final JobClient jobClient = executeAsync(streamGraph); // 再调用这个方法
execute(streamGraph, configuration, userClassloader) // 再调一层 这个execute方法是接口PipelineExecutor中的方法
PipelineExecutor的实现类:
以AbstractJobClusterExecutor为例
AbstractJobClusterExecutor的execute方法,先看一看整个,下面开始分析
public CompletableFuture<JobClient> execute(@Nonnull final Pipeline pipeline, @Nonnull final Configuration configuration, @Nonnull final ClassLoader userCodeClassloader) throws Exception {
/*TODO 将 流图(StreamGraph) 转换成 作业图(JobGraph)*/
final JobGraph jobGraph = PipelineExecutorUtils.getJobGraph(pipeline, configuration);
/*TODO 集群描述器:创建、启动了 YarnClient, 包含了一些yarn、flink的配置和环境信息*/
try (final ClusterDescriptor<ClusterID> clusterDescriptor = clusterClientFactory.createClusterDescriptor(configuration)) {
final ExecutionConfigAccessor configAccessor = ExecutionConfigAccessor.fromConfiguration(configuration);
/*TODO 集群特有资源配置:JobManager内存、TaskManager内存、每个Tm的slot数*/
final ClusterSpecification clusterSpecification = clusterClientFactory.getClusterSpecification(configuration);
final ClusterClientProvider<ClusterID> clusterClientProvider = clusterDescriptor
.deployJobCluster(clusterSpecification, jobGraph, configAccessor.getDetachedMode());
LOG.info("Job has been submitted with JobID " + jobGraph.getJobID());
return CompletableFuture.completedFuture(
new ClusterClientJobClientAdapter<>(clusterClientProvider, jobGraph.getJobID(), userCodeClassloader));
}
}
16.将 流图(StreamGraph) 转换成 作业图(JobGraph)
final JobGraph jobGraph = PipelineExecutorUtils.getJobGraph(pipeline, configuration);
17.创建集群描述器
final ClusterDescriptor<ClusterID> clusterDescriptor = clusterClientFactory.createClusterDescriptor(configuration)
再调用getClusterDescriptor(configuration); 创建、初始化、启动YarnClient,返回YarnClusterDescriptor
private YarnClusterDescriptor getClusterDescriptor(Configuration configuration) {
/*TODO 创建了YarnClient*/
final YarnClient yarnClient = YarnClient.createYarnClient();
final YarnConfiguration yarnConfiguration = new YarnConfiguration();
/*TODO 初始化、启动 YarnClient*/
yarnClient.init(yarnConfiguration);
yarnClient.start();
return new YarnClusterDescriptor(
configuration,
yarnConfiguration,
yarnClient,
YarnClientYarnClusterInformationRetriever.create(yarnClient),
false);
}
18.集群资源配置
final ClusterSpecification clusterSpecification = clusterClientFactory.getClusterSpecification(configuration);
AbstractContainerizedClusterClientFactory类中
public ClusterSpecification getClusterSpecification(Configuration configuration) {
checkNotNull(configuration);
final int jobManagerMemoryMB = JobManagerProcessUtils.processSpecFromConfigWithNewOptionToInterpretLegacyHeap(
configuration,
JobManagerOptions.TOTAL_PROCESS_MEMORY)
.getTotalProcessMemorySize()
.getMebiBytes();
final int taskManagerMemoryMB = TaskExecutorProcessUtils
.processSpecFromConfig(TaskExecutorProcessUtils.getConfigurationMapLegacyTaskManagerHeapSizeToConfigOption(
configuration, TaskManagerOptions.TOTAL_PROCESS_MEMORY))
.getTotalProcessMemorySize()
.getMebiBytes();
int slotsPerTaskManager = configuration.getInteger(TaskManagerOptions.NUM_TASK_SLOTS);
return new ClusterSpecification.ClusterSpecificationBuilder()
.setMasterMemoryMB(jobManagerMemoryMB)
.setTaskManagerMemoryMB(taskManagerMemoryMB)
.setSlotsPerTaskManager(slotsPerTaskManager)
.createClusterSpecification();
}
19.创建appContext,提交应用
final ClusterClientProvider<ClusterID> clusterClientProvider = clusterDescriptor
.deployJobCluster(clusterSpecification, jobGraph, configAccessor.getDetachedMode());
调用:
deployJobCluster方法
return deployInternal(
clusterSpecification,
"Flink per-job cluster",
getYarnJobClusterEntrypoint(),
jobGraph,
detached);
deployInternal方法
// 先检查
isReadyForDeployment(clusterSpecification);
checkYarnQueues(yarnClient);
/*TODO 开始启动AM*/
ApplicationReport report = startAppMaster(
flinkConfiguration,
applicationName,
yarnClusterEntrypoint,
jobGraph,
yarnClient,
yarnApplication,
validClusterSpecification);
startAppMaster方法
这部分就是将运行需要的配置、jar包等封装到appContext中,最后提交时就提交appContext,乱七八糟的不看也罢
/*TODO 初始化、创建 Hadoop的 FileSystem*/
org.apache.flink.core.fs.FileSystem.initialize(
configuration,
PluginUtils.createPluginManagerFromRootFolder(configuration));
final FileSystem fs = FileSystem.get(yarnConfiguration);
// 创建申请提交上下文 最后提交时提交的参数
ApplicationSubmissionContext appContext = yarnApplication.getApplicationSubmissionContext();
/*TODO Yarn应用的文件上传器:FS、对应的HDFS路径 用来上传:用户jar包、flink的依赖、flink的配置文件 */
final YarnApplicationFileUploader fileUploader = YarnApplicationFileUploader.from(
fs,
getStagingDir(fs),
providedLibDirs,
appContext.getApplicationId(),
getFileReplication());
/*TODO 高可用配置:重试次数,默认2次*/
/*TODO 添加用户jar包*/
/*TODO 上传Flink的配置文件 - flink-conf.yaml*/
/*TODO jobmanager内存配置*/
...
fileUploader.close();
//创建appMasterEnv
// Setup CLASSPATH and environment variables for ApplicationMaster
/*TODO 创建Map,用来存储 AM的环境变量和类路径*/
final Map<String, String> appMasterEnv = new HashMap<>();
// 然后是一堆put,不看也罢
// set user specified app master environment variables
appMasterEnv.putAll(
ConfigurationUtils.getPrefixedKeyValuePairs(ResourceManagerOptions.CONTAINERIZED_MASTER_ENV_PREFIX, configuration));
// set Flink app class path
appMasterEnv.put(YarnConfigKeys.ENV_FLINK_CLASSPATH, classPathBuilder.toString());
// set Flink on YARN internal configuration values
appMasterEnv.put(YarnConfigKeys.FLINK_DIST_JAR, localResourceDescFlinkJar.toString());
appMasterEnv.put(YarnConfigKeys.ENV_APP_ID, appId.toString());
appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_HOME_DIR, fileUploader.getHomeDir().toString());
appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_SHIP_FILES, encodeYarnLocalResourceDescriptorListToString(fileUploader.getEnvShipResourceList()));
appMasterEnv.put(YarnConfigKeys.ENV_ZOOKEEPER_NAMESPACE, getZookeeperNamespace());
appMasterEnv.put(YarnConfigKeys.FLINK_YARN_FILES, fileUploader.getApplicationDir().toUri().toString());
// https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#identity-on-an-insecure-cluster-hadoop_user_name
appMasterEnv.put(YarnConfigKeys.ENV_HADOOP_USER_NAME, UserGroupInformation.getCurrentUser().getUserName());
if (localizedKeytabPath != null) {
appMasterEnv.put(YarnConfigKeys.LOCAL_KEYTAB_PATH, localizedKeytabPath);
String principal = configuration.getString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL);
appMasterEnv.put(YarnConfigKeys.KEYTAB_PRINCIPAL, principal);
if (remotePathKeytab != null) {
appMasterEnv.put(YarnConfigKeys.REMOTE_KEYTAB_PATH, remotePathKeytab.toString());
}
}
//To support Yarn Secure Integration Test Scenario
if (remoteYarnSiteXmlPath != null) {
appMasterEnv.put(YarnConfigKeys.ENV_YARN_SITE_XML_PATH, remoteYarnSiteXmlPath.toString());
}
if (remoteKrb5Path != null) {
appMasterEnv.put(YarnConfigKeys.ENV_KRB5_PATH, remoteKrb5Path.toString());
}
// set classpath from YARN configuration
Utils.setupYarnClassPath(yarnConfiguration, appMasterEnv);
/*TODO 将之前封装的 Map(AM的环境信息、类路径),设置到容器里*/
amContainer.setEnvironment(appMasterEnv);
// 创建capability
// Set up resource type requirements for ApplicationMaster
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(clusterSpecification.getMasterMemoryMB());
capability.setVirtualCores(flinkConfiguration.getInteger(YarnConfigOptions.APP_MASTER_VCORES));
final String customApplicationName = customName != null ? customName : applicationName;
// 设置appContext,将上面的那些amContainer、capability等set进appContext
// 前面也有许多地方调用appContext的set方法了,乱七八糟的没贴出来
appContext.setApplicationName(customApplicationName);
appContext.setApplicationType(applicationType != null ? applicationType : "Apache Flink");
appContext.setAMContainerSpec(amContainer);
appContext.setResource(capability);
...
// 提交应用
/*TODO 提交应用了*/
yarnClient.submitApplication(appContext); // 这就到org.apache.hadoop.yarn.client.api了,yarn的内容了