Flume-1.6.0部分源码解析

本文深入解析Flume-1.6.0的启动过程,包括Application的入口、PollingPropertiesFileConfigurationProvider的配置监测、组件启动与管理,以及配置加载机制。详细阐述了configurationProvider的start方法如何定期检查配置文件变化,eventBus如何协调组件更新,以及Source如何将数据传递给Channel的过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、Flume的入口---Application.java
Main函数中的关键代码如下:
EventBus eventBus = new EventBus(agentName + "-event-bus");
          PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
components.add(configurationProvider);
application = new Application(components);
logger.info("add by  徐国坤 EventBus");
eventBus.register(application);
。。。。。。。。。
application.start();
2、PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30)讲解
关键代码:
PollingPropertiesFileConfigurationProvider里面有个关键的方法:start
 public void start() {
  Preconditions.checkState(file != null, parameter file must not be null");
  executorService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat("conf-file-poller-%d")
                .build());
    FileWatcherRunnable fileWatcherRunnable =
        new FileWatcherRunnable(file, counterGroup);
    executorService.scheduleWithFixedDelay(fileWatcherRunnable, 0, interval,TimeUnit.SECONDS);
    lifecycleState = LifecycleState.START;
    LOGGER.debug("Configuration provider started");
  }
该段代码的主要作用是定期监测配置文件是不是有变化,如果发生变化能实时的监测到。fileWatcherRunnable中的eventBus.post(getConfiguration())是start方法的关键内容,如果配置文件发生变化,就会从新加载Source、Channel和Sink,对于使用者来说是透明的。
eventBus.post(getConfiguration())的作用是定期读取配置文件,重新加载Source、Channel和Sink。
3、application.start()关键讲解
关键代码:
logger.info("add by  徐国坤 Application started");
for(LifecycleAware component : components) {
      supervisor.supervise(component,new       SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
}
components本质上都是实现了LifecycleAware接口。
Supervise的主要代码如下:
 Supervisoree process = new Supervisoree();
process.status = new Status();
process.policy = policy;
process.status.desiredState = desiredState;
process.status.error = false;
MonitorRunnable monitorRunnable = new MonitorRunnable();
monitorRunnable.lifecycleAware = lifecycleAware;
monitorRunnable.supervisoree = process;
monitorRunnable.monitorService = monitorService;
supervisedProcesses.put(lifecycleAware, process);
ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
monitorRunnable, 0, 3, TimeUnit.SECONDS);
monitorFutures.put(lifecycleAware, future);
上面代码中比较关键的是:monitorRunnable,monitorRunnable是个内部类,实现了Runnable接口。monitorRunnable的代码逻辑中起关键作用的是: lifecycleAware.start(),根据java语法的接口实现来看,lifecycleAware.start()其实是调用的component 的start方法,也就是调用的PollingPropertiesFileConfigurationProvider 的start方法。关于该start方法,已经在文章的上面进行了介绍。
4、getConfiguration()方法的剖析
PollingPropertiesFileConfigurationProvider 实现了
PropertiesFileConfigurationProvider接口,PropertiesFileConfigurationProvider又继承自AbstractConfigurationProvider类。根据这层继承关系,getConfiguration()其实是调用的AbstractConfigurationProvider的getConfiguration()方法。getConfiguration()的主要代码如下:
public MaterializedConfiguration getConfiguration() {
    MaterializedConfiguration conf = new SimpleMaterializedConfiguration();
    FlumeConfiguration fconfig = getFlumeConfiguration();
    AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());
    if (agentConf != null) {
      Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();
      Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();
      Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();
      try {
        loadChannels(agentConf, channelComponentMap);
        loadSources(agentConf, channelComponentMap, sourceRunnerMap);
        loadSinks(agentConf, channelComponentMap, sinkRunnerMap);
        Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
        for(String channelName : channelNames) {
          ChannelComponent channelComponent = channelComponentMap.get(channelName);
          if(channelComponent.components.isEmpty()) {
            LOGGER.warn(String.format("Channel %s has no components connected" +" and has been removed.", channelName));
            channelComponentMap.remove(channelName);
            Map<String, Channel> nameChannelMap = channelCache.get(channelComponent.channel.getClass());
            if(nameChannelMap != null) {
              nameChannelMap.remove(channelName);
            }
          } else {
            LOGGER.info(String.format("Channel %s connected to %s",channelName, channelComponent.components.toString()));
conf.addChannel(channelName, channelComponent.channel);
          }
        }
 for(Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {
conf.addSourceRunner(entry.getKey(), entry.getValue());
        }
        for(Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {
          conf.addSinkRunner(entry.getKey(), entry.getValue());
        }
      } catch (InstantiationException ex) {
        LOGGER.error("Failed to instantiate component", ex);
      } finally {
        channelComponentMap.clear();
        sourceRunnerMap.clear();
        sinkRunnerMap.clear();
      }
    } else {
 }
    return conf;
  }
loadChannels、loadSources和loadSinks是上述代码比较关键的点。
那么Source如何知道将自己的数据给哪个Channel呢?看下面的代码:
以SpoolingDirectorySource来说明问题:SpoolDirectorySource也是有个关键的start方法,该方法主要代码是:
 Runnable runner = new SpoolDirectoryRunnable(reader, sourceCounter);
executor.scheduleWithFixedDelay(runner, 0, POLL_DELAY_MS, TimeUnit.MILLISECONDS);
SpoolDirectoryRunnable是个内部类,该类中的主要方法是run,run方法的主要内容是:
    public void run() {
      int backoffInterval = 250;
      try {
        while (!Thread.interrupted()) {
          List<Event> events = reader.readEvents(batchSize);
          if (events.isEmpty()) {
            break;
          }
          sourceCounter.addToEventReceivedCount(events.size());
          sourceCounter.incrementAppendBatchReceivedCount();

          try {
            getChannelProcessor().processEventBatch(events);
            reader.commit();
          } catch (ChannelException ex) {
hitChannelException = true;
if (backoff) {
              TimeUnit.MILLISECONDS.sleep(backoffInterval);
backoffInterval = backoffInterval << 1;
backoffInterval = backoffInterval >= maxBackoff ? maxBackoff :
  backoffInterval;
}
continue;
}
backoffInterval = 250;
sourceCounter.addToEventAcceptedCount(events.size());
sourceCounter.incrementAppendBatchAcceptedCount();
        }
      } catch (Throwable t) {
       hasFatalError = true;
       Throwables.propagate(t);
 }
 }
Run方法中的 getChannelProcessor().processEventBatch(events);方法是找到相应的channel去消费相应的event。其实channel如何找到相应的sink去消费也是相似的道理,这里就不再赘述,后续有代码分析会及时更新。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值