简介
FailoverSinkProcessor是Sink组的处理器,其工作机制是通过优先级的方式选择Sink进行数据发送,最高级的Sink会持续写数据的优先权,直到其执行失败,被加入失败队列待经常最大失败回退时间恢复后重新执行。
配置参数
参数 | 默认值 | 描述 |
---|---|---|
type | - | failover |
priority | - | sink优先级 |
maxpenalty | 30000(ms) | 失败最大回退时间 |
配置示例
agent.sinks = s1 s2 s3 s4
agent.sinkgroups.group1.processor.type = failover
agent.sinkgroups.group1.processor.priority.s1 = 90
agent.sinkgroups.group1.processor.priority.s2 = 100
agent.sinkgroups.group1.processor.priority.s4 = 110
agent.sinkgroups.group1.processor.maxpenalty = 10000
以上配置,s1优先级为90,s2优先级为100,s4优先级为110,s3没有配置故默认为0,其后若有其他没配置优先即的则依次-1,如下一个为-1,下下个为-2。
源码分析
configure方法
public void configure(Context context) {
//TreeMap实现为红黑数,有序,按优先级大小排序。
liveSinks = new TreeMap<Integer, Sink>();
//Queue先入先出,即先失败的Sink先被拿出来尝试恢复
failedSinks = new PriorityQueue<FailedSink>();
//初始优先级
Integer nextPrio = 0;
//最大失败恢复时间
String maxPenaltyStr = context.getString(MAX_PENALTY_PREFIX);
if (maxPenaltyStr == null) {
//默认30000
maxPenalty = DEFAULT_MAX_PENALTY;
} else {
try {
maxPenalty = Integer.parseInt(maxPenaltyStr);
} catch (NumberFormatException e) {
logger.warn("{} is not a valid value for {}",
new Object[] { maxPenaltyStr, MAX_PENALTY_PREFIX });
maxPenalty = DEFAULT_MAX_PENALTY;
}
}
for (Entry<String, Sink> entry : sinks.entrySet()) {
String priStr = PRIORITY_PREFIX + entry.getKey();
Integer priority;
try {
priority = Integer.parseInt(context.getString(priStr));
} catch (Exception e) {
//没有配优先级的依次减1
priority = --nextPrio;
}
if (!liveSinks.containsKey(priority)) {
liveSinks.put(priority, sinks.get(entry.getKey()));
} else {
//相同优先级则不能同时加入
logger.warn("Sink {} not added to FailverSinkProcessor as priority" +
"duplicates that of sink {}", entry.getKey(),
liveSinks.get(priority));
}
}
//最获取最后一个key的值,即优先级最大的为活跃Sink
activeSink = liveSinks.get(liveSinks.lastKey());
}
总结:
1.sink组按优先级选择,优先级最大的初始化时先被选中作为活跃Sink
2.失败队列先加进去的会先被取出尝试恢复
3.相同优先级的sink,只能添加先配的那个,否则同优先级的每次都会被互相代替,flume1.2版本修成了这个Bug。
[FLUME-1002] - FailoverSinkProcessor replaces sinks with same priority
地址:flume-1.2 Release
process方法
public Status process() throws EventDeliveryException {
// Retry any failed sinks that have gone through their "cooldown" period
Long now = System.currentTimeMillis();
//去出最先失败的Sink判断恢复时间是否小于当前,小于当前即可尝试恢复
while (!failedSinks.isEmpty() && failedSinks.peek().getRefresh() < now) {
//取出头节点
FailedSink cur = failedSinks.poll();
Status s;
try {
//进行sink发送处理
s = cur.getSink().process();
if (s == Status.READY) {
//sink可发送数据则将其恢复
liveSinks.put(cur.getPriority(), cur.getSink());
//取出恢复sink后的当前Sink列表中优先级最大的
activeSink = liveSinks.get(liveSinks.lastKey());
logger.debug("Sink {} was recovered from the fail list",
cur.getSink().getName());
} else {
// if it's a backoff it needn't be penalized.
//如果发送失败,则按指数方式增加时间,回到失败队列尾部
failedSinks.add(cur);
}
return s;
} catch (Exception e) {
cur.incFails();
failedSinks.add(cur);
}
}
//若没有失败队列或没有能恢复的sink,则选用当前活跃Sink处理
Status ret = null;
while (activeSink != null) {
try {
ret = activeSink.process();
return ret;
} catch (Exception e) {
logger.warn("Sink {} failed and has been sent to failover list",
activeSink.getName(), e);
//执行失败则加入失败队列
activeSink = moveActiveToDeadAndGetNext();
}
}
throw new EventDeliveryException("All sinks failed to process, " +
"nothing left to failover to");
}
总结:
1.失败sink的恢复在每次选择sink处理数据时开始,选择失败队列的头部即最先被恢复,
根据失败时间决定,跟优先级没有关系。
2.对于活跃Sink的选择,始终根据优先级决定。
疑问:奇怪的是在看《Flume 构建高可用、可扩展的海量日志采集系统》这本书中,对于失败恢复说明如下:
疑问:
奇怪的是在看《Flume 构建高可用、可扩展的海量日志采集系统》这本书中,对于失败恢复说明如下:
但在process方法中,失败恢复后的sink会被加入到liveSinks中,因此如果其优先级若比当前活跃的activeSink高,则活跃Sink会被取代才是。
//sink可发送数据则将其恢复
liveSinks.put(cur.getPriority(), cur.getSink());
//取出恢复sink后的当前Sink列表中优先级最大的
activeSink = liveSinks.get(liveSinks.lastKey());
logger.debug("Sink {} was recovered from the fail list",
cur.getSink().getName());