1、Flume的入口---Application.java
Main函数中的关键代码如下:
EventBus eventBus = new EventBus(agentName + "-event-bus");
PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30);
components.add(configurationProvider);
application = new Application(components);
logger.info("add by 徐国坤 EventBus");
eventBus.register(application);
。。。。。。。。。
application.start();
2、PollingPropertiesFileConfigurationProvider configurationProvider = new PollingPropertiesFileConfigurationProvider(agentName, configurationFile, eventBus, 30)讲解
关键代码:
PollingPropertiesFileConfigurationProvider里面有个关键的方法:start
public void start() {
Preconditions.checkState(file != null, parameter file must not be null");
executorService = Executors.newSingleThreadScheduledExecutor(
new ThreadFactoryBuilder().setNameFormat("conf-file-poller-%d")
.build());
FileWatcherRunnable fileWatcherRunnable =
new FileWatcherRunnable(file, counterGroup);
executorService.scheduleWithFixedDelay(fileWatcherRunnable, 0, interval,TimeUnit.SECONDS);
lifecycleState = LifecycleState.START;
LOGGER.debug("Configuration provider started");
}
该段代码的主要作用是定期监测配置文件是不是有变化,如果发生变化能实时的监测到。fileWatcherRunnable中的eventBus.post(getConfiguration())是start方法的关键内容,如果配置文件发生变化,就会从新加载Source、Channel和Sink,对于使用者来说是透明的。
eventBus.post(getConfiguration())的作用是定期读取配置文件,重新加载Source、Channel和Sink。
3、application.start()关键讲解
关键代码:
logger.info("add by 徐国坤 Application started");
for(LifecycleAware component : components) {
supervisor.supervise(component,new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
}
components本质上都是实现了LifecycleAware接口。
Supervise的主要代码如下:
Supervisoree process = new Supervisoree();
process.status = new Status();
process.policy = policy;
process.status.desiredState = desiredState;
process.status.error = false;
MonitorRunnable monitorRunnable = new MonitorRunnable();
monitorRunnable.lifecycleAware = lifecycleAware;
monitorRunnable.supervisoree = process;
monitorRunnable.monitorService = monitorService;
supervisedProcesses.put(lifecycleAware, process);
ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay(
monitorRunnable, 0, 3, TimeUnit.SECONDS);
monitorFutures.put(lifecycleAware, future);
上面代码中比较关键的是:monitorRunnable,monitorRunnable是个内部类,实现了Runnable接口。monitorRunnable的代码逻辑中起关键作用的是: lifecycleAware.start(),根据java语法的接口实现来看,lifecycleAware.start()其实是调用的component 的start方法,也就是调用的PollingPropertiesFileConfigurationProvider 的start方法。关于该start方法,已经在文章的上面进行了介绍。
4、getConfiguration()方法的剖析
PollingPropertiesFileConfigurationProvider 实现了
PropertiesFileConfigurationProvider接口,PropertiesFileConfigurationProvider又继承自AbstractConfigurationProvider类。根据这层继承关系,getConfiguration()其实是调用的AbstractConfigurationProvider的getConfiguration()方法。getConfiguration()的主要代码如下:
public MaterializedConfiguration getConfiguration() {
MaterializedConfiguration conf = new SimpleMaterializedConfiguration();
FlumeConfiguration fconfig = getFlumeConfiguration();
AgentConfiguration agentConf = fconfig.getConfigurationFor(getAgentName());
if (agentConf != null) {
Map<String, ChannelComponent> channelComponentMap = Maps.newHashMap();
Map<String, SourceRunner> sourceRunnerMap = Maps.newHashMap();
Map<String, SinkRunner> sinkRunnerMap = Maps.newHashMap();
try {
loadChannels(agentConf, channelComponentMap);
loadSources(agentConf, channelComponentMap, sourceRunnerMap);
loadSinks(agentConf, channelComponentMap, sinkRunnerMap);
Set<String> channelNames = new HashSet<String>(channelComponentMap.keySet());
for(String channelName : channelNames) {
ChannelComponent channelComponent = channelComponentMap.get(channelName);
if(channelComponent.components.isEmpty()) {
LOGGER.warn(String.format("Channel %s has no components connected" +" and has been removed.", channelName));
channelComponentMap.remove(channelName);
Map<String, Channel> nameChannelMap = channelCache.get(channelComponent.channel.getClass());
if(nameChannelMap != null) {
nameChannelMap.remove(channelName);
}
} else {
LOGGER.info(String.format("Channel %s connected to %s",channelName, channelComponent.components.toString()));
conf.addChannel(channelName, channelComponent.channel);
}
}
for(Map.Entry<String, SourceRunner> entry : sourceRunnerMap.entrySet()) {
conf.addSourceRunner(entry.getKey(), entry.getValue());
}
for(Map.Entry<String, SinkRunner> entry : sinkRunnerMap.entrySet()) {
conf.addSinkRunner(entry.getKey(), entry.getValue());
}
} catch (InstantiationException ex) {
LOGGER.error("Failed to instantiate component", ex);
} finally {
channelComponentMap.clear();
sourceRunnerMap.clear();
sinkRunnerMap.clear();
}
} else {
}
return conf;
}
loadChannels、loadSources和loadSinks是上述代码比较关键的点。
那么Source如何知道将自己的数据给哪个Channel呢?看下面的代码:
以SpoolingDirectorySource来说明问题:SpoolDirectorySource也是有个关键的start方法,该方法主要代码是:
Runnable runner = new SpoolDirectoryRunnable(reader, sourceCounter);
executor.scheduleWithFixedDelay(runner, 0, POLL_DELAY_MS, TimeUnit.MILLISECONDS);
SpoolDirectoryRunnable是个内部类,该类中的主要方法是run,run方法的主要内容是:
public void run() {
int backoffInterval = 250;
try {
while (!Thread.interrupted()) {
List<Event> events = reader.readEvents(batchSize);
if (events.isEmpty()) {
break;
}
sourceCounter.addToEventReceivedCount(events.size());
sourceCounter.incrementAppendBatchReceivedCount();
try {
getChannelProcessor().processEventBatch(events);
reader.commit();
} catch (ChannelException ex) {
hitChannelException = true;
if (backoff) {
TimeUnit.MILLISECONDS.sleep(backoffInterval);
backoffInterval = backoffInterval << 1;
backoffInterval = backoffInterval >= maxBackoff ? maxBackoff :
backoffInterval;
}
continue;
}
backoffInterval = 250;
sourceCounter.addToEventAcceptedCount(events.size());
sourceCounter.incrementAppendBatchAcceptedCount();
}
} catch (Throwable t) {
hasFatalError = true;
Throwables.propagate(t);
}
}
Run方法中的 getChannelProcessor().processEventBatch(events);方法是找到相应的channel去消费相应的event。其实channel如何找到相应的sink去消费也是相似的道理,这里就不再赘述,后续有代码分析会及时更新。
Flume-1.6.0部分源码解析
最新推荐文章于 2019-12-06 12:07:35 发布