Flume-1.6.0部分源码解析

最新推荐文章于 2019-12-06 12:07:35 发布

原创最新推荐文章于 2019-12-06 12:07:35 发布 · 573 阅读

0 ·

CC 4.0 BY-SA版权

大数据同时被 2 个专栏收录

178 篇文章

订阅专栏

Flume

4 篇文章

订阅专栏

本文深入解析Flume-1.6.0的启动过程，包括Application的入口、PollingPropertiesFileConfigurationProvider的配置监测、组件启动与管理，以及配置加载机制。详细阐述了configurationProvider的start方法如何定期检查配置文件变化，eventBus如何协调组件更新，以及Source如何将数据传递给Channel的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1、Flume的入口---Application.java
Main函数中的关键代码如下：
EventBus eventBus = new EventBus(agentName + "-event-bus");
          PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
components.add(configurationProvider);
application = new Application(components);
logger.info("add by  徐国坤 EventBus");
eventBus.register(application);
。。。。。。。。。
application.start();
2、PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30)讲解
关键代码：
PollingPropertiesFileConfigurationProvider里面有个关键的方法：start
 public void start() {
  Preconditions.checkState(file != null, parameter file must not be null");
  executorService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat("conf-file-poller-%d")
                .build());
    FileWatcherRunnable fileWatcherRunnable =
        new FileWatcherRunnable(file, counterGroup);
    executorService.scheduleWithFixedDelay(fileWatcherRunnable, 0, interval,TimeUnit.SECONDS);
    lifecycleState = LifecycleState.START;
    LOGGER.debug("Configuration provider started");
  }
该段代码的主要作用是定期监测配置文件是不是有变化，如果发生变化能实时的监测到。fileWatcherRunnable中的eventBus.post(getConfiguration())是start方法的关键内容，如果配置文件发生变化，就会从新加载Source、Channel和Sink，对于使用者来说是透明的。
eventBus.post(getConfiguration())的作用是定期读取配置文件，重新加载Source、Channel和Sink。
3、application.start()关键讲解
关键代码：
logger.info("add by  徐国坤 Application started");
for(LifecycleAware component : components) {
      supervisor.supervise(component,new       SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
}
components本质上都是实现了LifecycleAware接口。
Supervise的主要代码如下：
 Supervisoree process = new Supervisoree();
process.status = new Status();
process.policy = policy;
process.status.desiredState = desiredState;
process.status.error = false;
MonitorRunnable monitorRunnable = new MonitorRunnable();
monitorRunnable.lifecycleAware = lifecycleAware;
monitorRunnable.supervisoree = process;
monitorRunnable.monitorService = monitorService;
supervisedProcesses.put(lifecycleAware, process);
ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
monitorRunnable, 0, 3, TimeUnit.SECONDS);
monitorFutures.put(lifecycleAware, future);
上面代码中比较关键的是：monitorRunnable，monitorRunnable是个内部类，实现了Runnable接口。monitorRunnable的代码逻辑中起关键作用的是： lifecycleAware.start()，根据java语法的接口实现来看，lifecycleAware.start()其实是调用的component 的start方法，也就是调用的PollingPropertiesFileConfigurationProvider 的start方法。关于该start方法，已经在文章的上面进行了介绍。
4、getConfiguration()方法的剖析
PollingPropertiesFileConfigurationProvider 实现了
PropertiesFileConfigurationProvider接口，PropertiesFileConfigurationProvider又继承自AbstractConfigurationProvider类。根据这层继承关系，getConfiguration()其实是调用的AbstractConfigurationProvider的getConfiguration()方法。getConfiguration()的主要代码如下：
public MaterializedConfiguration getConfiguration() {
    MaterializedConfiguration conf = new SimpleMaterializedConfiguration();
    FlumeConfiguration fconfig = getFlumeConfiguration();
    AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());
    if (agentConf != null) {
      Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();
      Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();
      Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();
      try {
        loadChannels(agentConf, channelComponentMap);
        loadSources(agentConf, channelComponentMap, sourceRunnerMap);
        loadSinks(agentConf, channelComponentMap, sinkRunnerMap);
        Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
        for(String channelName : channelNames) {
          ChannelComponent channelComponent = channelComponentMap.get(channelName);
          if(channelComponent.components.isEmpty()) {
            LOGGER.warn(String.format("Channel %s has no components connected" +" and has been removed.", channelName));
            channelComponentMap.remove(channelName);
            Map<String, Channel> nameChannelMap = channelCache.get(channelComponent.channel.getClass());
            if(nameChannelMap != null) {
              nameChannelMap.remove(channelName);
            }
          } else {
            LOGGER.info(String.format("Channel %s connected to %s",channelName, channelComponent.components.toString()));
conf.addChannel(channelName, channelComponent.channel);
          }
        }
 for(Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {
conf.addSourceRunner(entry.getKey(), entry.getValue());
        }
        for(Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {
          conf.addSinkRunner(entry.getKey(), entry.getValue());
        }
      } catch (InstantiationException ex) {
        LOGGER.error("Failed to instantiate component", ex);
      } finally {
        channelComponentMap.clear();
        sourceRunnerMap.clear();
        sinkRunnerMap.clear();
      }
    } else {
 }
    return conf;
  }
loadChannels、loadSources和loadSinks是上述代码比较关键的点。
那么Source如何知道将自己的数据给哪个Channel呢？看下面的代码：
以SpoolingDirectorySource来说明问题：SpoolDirectorySource也是有个关键的start方法,该方法主要代码是：
 Runnable runner = new SpoolDirectoryRunnable(reader, sourceCounter);
executor.scheduleWithFixedDelay(runner, 0, POLL_DELAY_MS, TimeUnit.MILLISECONDS);
SpoolDirectoryRunnable是个内部类，该类中的主要方法是run，run方法的主要内容是：
    public void run() {
      int backoffInterval = 250;
      try {
        while (!Thread.interrupted()) {
          List<Event> events = reader.readEvents(batchSize);
          if (events.isEmpty()) {
            break;
          }
          sourceCounter.addToEventReceivedCount(events.size());
          sourceCounter.incrementAppendBatchReceivedCount();

          try {
            getChannelProcessor().processEventBatch(events);
            reader.commit();
          } catch (ChannelException ex) {
hitChannelException = true;
if (backoff) {
              TimeUnit.MILLISECONDS.sleep(backoffInterval);
backoffInterval = backoffInterval << 1;
backoffInterval = backoffInterval >= maxBackoff ? maxBackoff :
  backoffInterval;
}
continue;
}
backoffInterval = 250;
sourceCounter.addToEventAcceptedCount(events.size());
sourceCounter.incrementAppendBatchAcceptedCount();
        }
      } catch (Throwable t) {
       hasFatalError = true;
       Throwables.propagate(t);
 }
 }
Run方法中的 getChannelProcessor().processEventBatch(events);方法是找到相应的channel去消费相应的event。其实channel如何找到相应的sink去消费也是相似的道理，这里就不再赘述，后续有代码分析会及时更新。