Flume - MemoryChannel源码解析

本文详细解析了MemoryChannel的工作原理,包括其配置参数、内部信号量机制、MemoryTransaction类结构及核心方法,展示了数据如何在source与sink间流动。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MemoryChannel的简易类结构:


内部类MemoryTransaction的简易类结构:

一,configure(Context context)

1,capacity:MemroyChannel的容量,默认是100。

2,transCapacity:每个事务最大的容量,也就是每个事务能够获取的最大Event数量。默认也是100。

3,byteCapacityBufferPercentage:定义Channle中Event所占的百分比,需要考虑在Header中的数据。

4,byteCapacity:byteCapacity等于设置的byteCapacity值或堆的80%乘以1减去byteCapacityBufferPercentage的百分比,然后除以100。源码如下:

    try {
      byteCapacity = (int)((context.getLong("byteCapacity", defaultByteCapacity).longValue() * (1 - byteCapacityBufferPercentage * .01 )) /byteCapacitySlotSize);
      if (byteCapacity < 1) {
        byteCapacity = Integer.MAX_VALUE;
      }
    } catch(NumberFormatException e) {
      byteCapacity = (int)((defaultByteCapacity * (1 - byteCapacityBufferPercentage * .01 )) /byteCapacitySlotSize);
    }

5,keep-alive:增加和删除一个Event的超时时间(单位:秒)

6,初始化LinkedBlockingDeque对象,大小为capacity。以及各种信号量对象。

7,最后初始化计数器。


二,信号量

MemoryChannel有三个信号量用来控制事务,防止容量越界。这三个信号量分别是:

queueRemaining : 表示空闲的容量大小。

queueStored : 表示已经存储的容量大小。

bytesRemaining : 表示可以使用的内存大小。该大小就是计算后的byteCapacity值。


三,MemoryTransaction

MemoryTransaction用来接收数据和事务控制。该类继承BasicTransactionSemantics类,BasicTransactionSemantics类有三个主要的方法:

1,put(Event event) 往Channel放入数据。
2, take() 从Channel获取数据。
3, begin() 开始事务
3,commit() 提交事务
4,rollback() 回滚事务

无论是Sink,还是Source都会调用getTransaction()方法,获取该事务实例。下面我们看看MemoryTransaction的初始化过程:

    public MemoryTransaction(int transCapacity, ChannelCounter counter) {
      putList = new LinkedBlockingDeque<Event>(transCapacity);
      takeList = new LinkedBlockingDeque<Event>(transCapacity);

      channelCounter = counter;
    }

由上可见,MemoryTransaction维护了两个队列,一个用于Source的put,一个用于Sink的take,容量大小为事务的容量(即:transCapacity)。

最后让我们了解一下MemoryTransaction类中,几个比较重要方法:

1,doPut(Event event)方法是往Channel插入数据

    protected void doPut(Event event) throws InterruptedException {
      channelCounter.incrementEventPutAttemptCount();<pre name="code" class="java">      //estimateEventSize方法返回Event的大小(只是Body的大小,header没有计算在内),并且除以byteCapacitySlotSize后向上取整计算
      int eventByteSize = (int)Math.ceil(estimateEventSize(event)/byteCapacitySlotSize);
      //添加到队列的尾部,如果超过了容量的限制,则添加失败,抛出ChannelException异常。
      if (!putList.offer(event)) {
        throw new ChannelException(
          "Put queue for MemoryTransaction of capacity " +
            putList.size() + " full, consider committing more frequently, " +
            "increasing capacity or increasing thread count");
      }
      //记录Event的byte值
      putByteCounter += eventByteSize;
    }

 

2,doTake() 方法是从Channel中获取数据。

protected Event doTake() throws InterruptedException {
      channelCounter.incrementEventTakeAttemptCount();
      //takeList队列的剩余容量,如果为0,则抛异常
      if(takeList.remainingCapacity() == 0) {
        throw new ChannelException("Take list for MemoryTransaction, capacity " +
            takeList.size() + " full, consider committing more frequently, " +
            "increasing capacity, or increasing thread count");
      }
      //尝试获取许可,如果可以获取到许可的话,证明queue队列有空间,否则返回null
      if(!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
        return null;
      }
      Event event;
      synchronized(queueLock) {
        //从queue取出一个Event
        event = queue.poll();
      }
      Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
          "signalling existence of entry");
      //加入到takeList队列中
      takeList.put(event);
      //更新takeByteCounter大小
      int eventByteSize = (int)Math.ceil(estimateEventSize(event)/byteCapacitySlotSize);
      takeByteCounter += eventByteSize;

      return event;
    }


3,doCommit() 提交

    protected void doCommit() throws InterruptedException {
      int remainingChange = takeList.size() - putList.size();
      //如果takeList更小,说明该MemoryChannel放的数据比取的数据要多,所以需要判断该MemoryChannel是否有空间来放
     if(remainingChange < 0) {
        //利用该信号量判断是否有剩余空间
        if(!bytesRemaining.tryAcquire(putByteCounter, keepAlive,
          TimeUnit.SECONDS)) {
          throw new ChannelException("Cannot commit transaction. Byte capacity " +
            "allocated to store event body " + byteCapacity * byteCapacitySlotSize +
            "reached. Please increase heap space/byte capacity allocated to " +
            "the channel as the sinks may not be keeping up with the sources");
        }
        //检查queue队列是否有足够的空间。
        if(!queueRemaining.tryAcquire(-remainingChange, keepAlive, TimeUnit.SECONDS)) {
          bytesRemaining.release(putByteCounter);
          throw new ChannelFullException("Space for commit to queue couldn't be acquired." +
              " Sinks are likely not keeping up with sources, or the buffer size is too tight");
        }
      }
      int puts = putList.size();
      int takes = takeList.size();
      //如果上述两个信号量都有空间的话,那么把putList中的Event放到该MemoryChannel中的queue中。
      synchronized(queueLock) {
        if(puts > 0 ) {
          while(!putList.isEmpty()) {
            if(!queue.offer(putList.removeFirst())) {
              throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
            }
          }
        }
        //最后情况putList和takeList
        putList.clear();
        takeList.clear();
      }
      //更新queue大小控制的信号量bytesRemaining
      bytesRemaining.release(takeByteCounter);
      takeByteCounter = 0;
      putByteCounter = 0;

      queueStored.release(puts);
      //takeList比putList大,说明该MemoryChannel中queue的数量应该是减少了,所以把(takeList-putList)的差值加到信号量queueRemaining
      if(remainingChange > 0) {
        queueRemaining.release(remainingChange);
      }
      //更新计数器
      if (puts > 0) {
        channelCounter.addToEventPutSuccessCount(puts);
      }
      if (takes > 0) {
        channelCounter.addToEventTakeSuccessCount(takes);
      }

      channelCounter.setChannelSize(queue.size());
    }

4,doRollback() 回滚

    protected void doRollback() {
      //获取takeList的大小,然后bytesRemaining中释放
      int takes = takeList.size();
      //将takeList中的Event重新放回到queue队列中。
      synchronized(queueLock) {
        Preconditions.checkState(queue.remainingCapacity() >= takeList.size(), "Not enough space in memory channel " +
            "queue to rollback takes. This should never happen, please report");
        while(!takeList.isEmpty()) {
          queue.addFirst(takeList.removeLast());
        }
        //最后清空putList
        putList.clear();
      }
      //清空了putList,所以需要把putList占用的空间添加到bytesRemaining中
      bytesRemaining.release(putByteCounter);
      putByteCounter = 0;
      takeByteCounter = 0;

      queueStored.release(takes);
      channelCounter.setChannelSize(queue.size());
    }

MemoryChannel的数据流向:

source -> putList -> queue -> takeList -> sink


MemoryChannel还是比较简单的,主要是通过MemoryTransaction中的putList、takeList与MemoryChannel中的queue进行数据流转和事务控制,MemoryChannel受内存空间的影响,如果数据产生的过快,同时获取信号量超时容易造成数据的丢失。而且Flume进程挂掉,数据也会丢失。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值