flume Channel选择器

本文介绍了Flume中两种内置的Channel选择器——复制选择器和多路复用选择器。复制选择器将事件复制并分发到所有Channel,而多路复用选择器根据事件报头的值选择特定的Channel进行写入。详细讨论了配置参数、配置示例以及源码分析。

简介

channel处理器用于决定source接收的时间写入哪个channel,由其通知channel,并将时间写入。

选择器类型

flume自带两种选择器:
1.relicating(复制选择器),默认使用
2.multiplexing(多路复用选择器)

1.复制选择器

复制选择器会复制每个source接收的事件,将事件复制并分发到所有channel,可通过配置参数,控制分发的channel。

配置参数
参数描述
typereplicating
optionalchannel名(即channel可选,失败时不会再次发送)
配置示例:
agent.sources.s1.type=avro
agent.sources.s1.channels = c1 c2 c3
agnet.sources.s1.selector.type=relicating
agent.sources.s1.selector.optional= c3

按上面配置,当source写c3失败时,不会抛出ChannelException异常,因此c3也不会导致source发送重试。

源码分析:

/**
 * Replicating channel selector. This selector allows the event to be placed
 * in all the channels that the source is configured with.
 */
public class ReplicatingChannelSelector extends org.apache.flume.channel.AbstractChannelSelector {

  /**
   * Configuration to set a subset of the channels as optional.
   */
  public static final String CONFIG_OPTIONAL = "optional";
  List<Channel> requiredChannels = null;
  List<Channel> optionalChannels = new ArrayList<Channel>();

  @Override
  //返回所有除option的channel列表
  public List<Channel> getRequiredChannels(Event event) {
    /*
     * Seems like there are lot of components within flume that do not call
     * configure method. It is conceiveable that custom component tests too
     * do that. So in that case, revert to old behavior.
     */
    if (requiredChannels == null) {
      return getAllChannels();
    }
    return requiredChannels;
  }

  @Override
  //返回所有option channel列表
  public List<Channel> getOptionalChannels(Event event) {
    return optionalChannels;
  }

  @Override
  public void configure(Context context) {
    String optionalList = context.getString(CONFIG_OPTIONAL);
    requiredChannels = new ArrayList<Channel>(getAllChannels());
    Map<String, Channel> channelNameMap = getChannelNameMap();
    if (optionalList != null && !optionalList.isEmpty()) {
      for (String optional : optionalList.split("\\s+")) {
        Channel optionalChannel = channelNameMap.get(optional);
        //如channel在channels后配置了,也在option配置了,则应从去除,只作为option channel
        requiredChannels.remove(optionalChannel);
        if (!optionalChannels.contains(optionalChannel)) {
          optionalChannels.add(optionalChannel);
        }
      }
    }
  }
}

2.多路复用选择器

多路复用选择器常通过报头的某个值来选择写入哪一个channel。

配置参数
参数描述
typemultiplexing
header报头
mapping.<header.value>报头映射列表,由报头的值匹配
optional.<header.value>与mapping一样,但写入失败会被忽略

配置示例:

agent.sources.s1.type=avro
agent.sources.s1.channels = c1 c2 c3 c4 c5
agnet.sources.s1.selector.type= multiplexing
agent.sources.s1.selector.header= priority
agent.sources.s1.selector.mapping.1 = c1 c2
agent.sources.s1.selector.mapping.2 = c2
agent.sources.s1.selector.optional.1= c3 
agent.sources.s1.selector.optional.2= c4 
agent.sources.s1.selector.optional.3= c4
agent.sources.s1.selector.default= c5

如按上配置,选择器会根据header里面的键priority(键)进行选择。
当值为1时,会写入c1,c2,c3,若写入c3出错,不会重试。
当值为2时,会写入c2和c4,写入c4,出错不会重试。
当值为3时,只写入c4,c5,应为没有mapping映射c3, 故会写如c5,且c4出错不重试。
当值不为1,2,3时,则会写入c5

源码阅读:
public class MultiplexingChannelSelector extends org.apache.flume.channel.AbstractChannelSelector {

  public static final String CONFIG_MULTIPLEX_HEADER_NAME = "header";
  public static final String DEFAULT_MULTIPLEX_HEADER ="flume.selector.header";
  public static final String CONFIG_PREFIX_MAPPING = "mapping.";
  public static final String CONFIG_DEFAULT_CHANNEL = "default";
  public static final String CONFIG_PREFIX_OPTIONAL = "optional";

  @SuppressWarnings("unused")
  private static final Logger LOG = LoggerFactory.getLogger(MultiplexingChannelSelector.class);

  private static final List<Channel> EMPTY_LIST =
      Collections.emptyList();

  //header参数值
  private String headerName;

  //channel映射表
  private Map<String, List<Channel>> channelMapping;
  //option channel映射表
  private Map<String, List<Channel>> optionalChannels;
  //default channel集合
  private List<Channel> defaultChannels;

  @Override
  public List<Channel> getRequiredChannels(Event event) {
    String headerValue = event.getHeaders().get(headerName);
    //header参数值不存在返回default的channel
    if (headerValue == null || headerValue.trim().length() == 0) {
      return defaultChannels;
    }

    //由header的值获取channel
    List<Channel> channels = channelMapping.get(headerValue);

    //header参数值
    if (channels == null) {
      channels = defaultChannels;
    }

    return channels;
  }

  @Override
  public List<Channel> getOptionalChannels(Event event) {
    String hdr = event.getHeaders().get(headerName);
    List<Channel> channels = optionalChannels.get(hdr);

    if (channels == null) {
      channels = EMPTY_LIST;
    }
    return channels;
  }

  @Override
  public void configure(Context context) {
    this.headerName = context.getString(CONFIG_MULTIPLEX_HEADER_NAME,
        DEFAULT_MULTIPLEX_HEADER);
    //获取所有的channel,包括option和mapping,default
    Map<String, Channel> channelNameMap = getChannelNameMap();

    //default Channels
    defaultChannels = getChannelListFromNames(
        context.getString(CONFIG_DEFAULT_CHANNEL), channelNameMap);

    Map<String, String> mapConfig =
        context.getSubProperties(CONFIG_PREFIX_MAPPING);

    // mapping映射的channel
    channelMapping = new HashMap<String, List<Channel>>();

    for (String headerValue : mapConfig.keySet()) {
      List<Channel> configuredChannels = getChannelListFromNames(
          mapConfig.get(headerValue),
          channelNameMap);

      //This should not go to default channel(s)
      //because this seems to be a bad way to configure.
      if (configuredChannels.size() == 0) {
        throw new FlumeException("No channel configured for when "
            + "header value is: " + headerValue);
      }
      //同样的值被配置了两次
      if (channelMapping.put(headerValue, configuredChannels) != null) {
        throw new FlumeException("Selector channel configured twice");
      }
    }
    //If no mapping is configured, it is ok.
    //All events will go to the default channel(s).
    //如果header value没有mapping或者optional任何channel,则写入到default
    Map<String, String> optionalChannelsMapping = context.getSubProperties(CONFIG_PREFIX_OPTIONAL + ".");

    optionalChannels = new HashMap<String, List<Channel>>();

    for (String hdr : optionalChannelsMapping.keySet()) {
      List<Channel> confChannels = getChannelListFromNames(
              optionalChannelsMapping.get(hdr), channelNameMap);
      if (confChannels.isEmpty()) {
        confChannels = EMPTY_LIST;
      }
      //Remove channels from optional channels, which are already
      //configured to be required channels.
      //没有mapping只有option则要加上default的channel
      List<Channel> reqdChannels = channelMapping.get(hdr);
      //Check if there are required channels, else defaults to default channels
      if (reqdChannels == null || reqdChannels.isEmpty()) {
        reqdChannels = defaultChannels;
      }
      for (Channel c : reqdChannels) {
        if (confChannels.contains(c)) {
          confChannels.remove(c);
        }
      }

      if (optionalChannels.put(hdr, confChannels) != null) {
        throw new FlumeException("Selector channel configured twice");
      }
    }

  }

}
### Flume 选择器的配置与使用 Flume选择器(Selector)用于确定事件从源(Source)传递到哪个接收器(Sink)。选择器可以是可选的,也可以是强制的。如果未定义选择器,则默认将所有事件发送到所有接收器[^3]。 #### 配置选择器Flume 的代理配置文件中,可以通过 `selector.type` 属性来指定选择器的类型。以下是两种常见的选择器类型: 1. **Replicating Selector** 默认的选择器类型是 `replicating`,它会将每个事件复制并发送到所有配置的接收器。 2. **Multiplexing Selector** `multiplexing` 选择器可以根据事件的头信息将事件路由到不同的接收器。这需要在配置文件中设置 `selector.type=multiplexing` 并指定头字段名称。 以下是一个示例配置文件,展示了如何使用 `multiplexing` 选择器: ```properties # 定义 agent 的组件 agent.sources = source1 agent.sinks = sink1 sink2 agent.channels = channel1 channel2 # 配置 source1 agent.sources.source1.type = netcat agent.sources.source1.bind = localhost agent.sources.source1.port = 44444 # 配置 channel1 和 channel2 agent.channels.channel1.type = memory agent.channels.channel2.type = memory # 配置 sink1 和 sink2 agent.sinks.sink1.type = logger agent.sinks.sink2.type = logger # 将 source1 连接到 channel1 和 channel2 agent.sources.source1.channels = channel1 channel2 # 配置 multiplexing 选择器 agent.sources.source1.selector.type = multiplexing agent.sources.source1.selector.header = myHeader # 根据 header 值路由事件到不同的通道 agent.sources.source1.selector.mapping.true = channel1 agent.sources.source1.selector.mapping.false = channel2 ``` 在这个示例中,事件的头信息包含一个名为 `myHeader` 的字段。如果 `myHeader` 的值为 `true`,事件将被路由到 `channel1`;如果值为 `false`,则路由到 `channel2`[^2]。 #### 示例代码 以下是一个简单的 Python 脚本,演示如何向 Flume 发送带有头信息的事件: ```python import socket import json def send_event_with_header(host, port, body, headers): message = json.dumps({"headers": headers, "body": body}) sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((host, port)) sock.sendall(message.encode('utf-8')) sock.close() if __name__ == "__main__": host = "localhost" port = 44444 body = "This is a test event" headers = {"myHeader": "true"} send_event_with_header(host, port, body, headers) ``` 此脚本将事件发送到 Flume 的 NetCat 源,并附加一个名为 `myHeader` 的头字段。根据前面的配置,该事件将被路由到 `channel1`[^3]。 ### 注意事项 - 确保 Flume 代理配置文件中的选择器类型和映射规则正确无误。 - 如果使用 `multiplexing` 选择器,请确保事件头信息与配置文件中的映射规则匹配[^2]。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值