Disruptor Ring Buffer as a Blocking Queue

介绍了一种基于Disruptor环形缓冲区实现的高性能单消费者阻塞队列,并通过基准测试对比了其与JDK中ArrayBlockingQueue等队列的性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Author: Wang, Xinglang 

Abstract

For any concurrent multi-threaded system, distributed computing or otherwise,the inter-thread messaging component is an very important component. In Java, the JDK provided
ArrayBlockingQueue, LinkedBlockingQueue, TransferQueue. And Disruptor (http://lmaxexchange.github.io/disruptor/)
is very famous based on its high performance on its inter-thread messaging, but it does not expose as a BlockingQueue. This blog will introduce a new Blocking Queue based on its ring buffer and also with a benchmark result.

Why require Blocking Queue interface

Blocking queue interface is widely used by existed code, changing to Disruptor directly will cause big changes since disruptor want to control the whole thread scheduling. Second, Disruptor only call back when there is an event arrived, but it does not have a chance to let the application control the behavior when the queue is built-up and do some pro-active throttling.This blog will introduce a BlockingQueue implementation on top of RingBuffer, but there is a limitation,this queue can only be consumed by one consumer thread, but for producer, it can be single or multiple producer thread. This will be useful for the Actor Pattern, which use a blocking queue and one thread to drain queue. The reason is the offset of the consumer side can be hard to maintain if there are multiple consumer threads, multiple thread consumers should use Disruptor WorkerPool to replace the JDK Executor.

Implementation

The source code is available on
Github:https://github.com/xinglang/disruptorqueue/tree/master/disruptorqueue
Since this queue only supports one consumer, so let's call it SingleConsumerDisruptorQueue
The SingleConsumerDisruptorQueue will have a ring buffer and a sequence (consumedSeq) for the
cosnumer, the cosnumedSeq will be the gating sequence of the ring buffer. And there a knownPublishedSeq which used to remember the last known published sequence. Since it will be a
blocking queue, so the wait strategy will be BlockingWaitStrategy (Default one).

private final RingBuffer<Event<T>> ringBuffer;

private final Sequence consumedSeq;

private final SequenceBarrier barrier;

private long knownPublishedSeq;

public SingleConsumerDisruptorQueue(int bufferSize, boolean singleProducer) {

if (singleProducer) {

ringBuffer = RingBuffer.createSingleProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

} else {

ringBuffer = RingBuffer.createMultiProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

}

consumedSeq = new Sequence();

ringBuffer.addGatingSequences(consumedSeq);

barrier = ringBuffer.newBarrier();

long cursor = ringBuffer.getCursor();

consumedSeq.set(cursor);

knownPublishedSeq = cursor;

}

For the publish, just use ring buffer publish. And inside the ring buffer, there is a event holder which
acts as a value holder of the item.

@Override
public boolean offer(T e) {
long seq;
try {
seq = ringBuffer.tryNext();
} catch (InsufficientCapacityException e1) {
return false;
}
publish(e, seq);
return true;
}
private void publish(T e, long seq) {
Event<T> holder = ringBuffer.get(seq);
holder.setValue(e);
ringBuffer.publish(seq);
}

For the consume, there is a optimization since only one consumer thread. Each time when call the waitFor, it can get the last known published sequence, if the consumer sequence less than the last known published sequence, it does not need call the barrier waitFor method.

@Override

public T take() throws InterruptedException {
long l = consumedSeq.get() + 1;
while (knownPublishedSeq < l) {
try {
knownPublishedSeq = barrier.waitFor(l);
} catch (AlertException e) {
throw new IllegalStateException(e);
} catch (TimeoutException e) {
throw new IllegalStateException(e);
}
}
Event<T> eventHolder = ringBuffer.get(l);
consumedSeq.incrementAndGet();
return eventHolder.getValue();
}

Performace analysis

First of all, it can get all benefits from the ring buffer design:

  • Avoid false sharing
  • Pre-allocated ring buffer, no any instance created during publish/consume
  • Less context switch, the consumer can get a batch of events without interrupted

Below is a benchmark for the queue and LinkedBlockingQueue, ArrayBlockingQueue and Transfer Queue. The Benchmark run on a baremetal machine with Ubuntu, the benchmark use 1 consumer thread, and 1 to 4 producer thread, each round run 32M put/take, the object for put is a constant string, so there is no any GC overhead for the object creation.

Single Producer benchmark

 

$ perf stat java -jar disruptortest.jar type=dbq                          
Producers :1, buffer size: 262144, batch:0                                
SingleConsumerDisruptorQueue transfer rate : 19890 per ms, Used 1687ms for 33554432                                                                  
Performance counter stats for 'java -jar disruptortest.jar type=dbq':     
3729.421847 task-clock # 1.998 CPUs utilized   
1,891 context-switches # 0.001 M/sec           
                      76 CPU-migrations # 0.000 M/sec                            
9,357 page-faults # 0.003 M/sec      
9,434,280,791 cycles # 2.530 GHz [83.38%]  
5,489,619,603 stalled-cycles-frontend # 58.19% frontend cycles idle [83.35%] 
2,618,037,087 stalled-cycles-backend # 27.75% backend cycles idle [66.99%] 
10,797,968,145 instructions # 1.14 insns per cycle       
                                      # 0.51 stalled cycles per insn [83.55%]
1,742,973,721 branches # 467.358 M/sec [83.28%]
      10,213,770 branch-misses # 0.59% of all branches [83.12%]
1.866803438 seconds time elapsed   
            
$ perf stat java -jar disruptortest.jar type=abq                                 
Producers :1, buffer size: 262144, batch:0                                      
ArrayBlockingQueue transfer rate : 2694 per ms, Used 12451ms for 33554432    
Performance counter stats for 'java -jar disruptortest.jar type=abq':
22976.952946 task-clock # 1.824 CPUs utilized  
232,766 context-switches # 0.010 M/sec           
80 CPU-migrations # 0.000 M/sec    
68,531 page-faults # 0.003 M/sec     
58,643,663,103 cycles # 2.552 GHz [83.14%] 
51,767,105,241 stalled-cycles-frontend # 88.27% frontend cycles idle [83.32%]
47,084,355,024 stalled-cycles-backend # 80.29% backend cycles idle [66.51%]
   12,035,035,540 instructions # 0.21 insns per cycle        
                                        # 4.30 stalled cycles per insn [83.44%]
 2,016,738,256 branches # 87.772 M/sec [83.56%]
        20,147,764 branch-misses # 1.00% of all branches [83.49%]
12.596555382 seconds time elapsed                                         
$ perf stat java -jar disruptortest.jar type=lbq                                  
Producers :1, buffer size: 262144, batch:0                                        
LinkedBlockingQueue transfer rate : 1132 per ms, Used 29632ms for 33554432          
Performance counter stats for 'java -jar disruptortest.jar type=lbq':             
58707.942294 task-clock # 1.968 CPUs utilized 
82,377 context-switches # 0.001 M/sec         
97 CPU-migrations # 0.000 M/sec   
133,543 page-faults # 0.002 M/sec     
151,825,969,348 cycles # 2.586 GHz [83.27%] 
139,833,905,165 stalled-cycles-frontend # 92.10% frontend cycles idle [83.40%]
131,712,244,095 stalled-cycles-backend # 86.75% backend cycles idle [66.67%]
10,997,843,405 instructions # 0.07 insns per cycle    
                                          # 12.71 stalled cycles per insn [83.26%]
  1,701,879,665 branches # 28.989 M/sec [83.31%]
         23,369,660 branch-misses # 1.37% of all branches [83.35%]
29.830928757 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=tq                                      
Producers :1, buffer size: 262144, batch:0                                       
LinkedTransferQueue transfer rate : 2139 per ms, Used 15685ms for 33554432       
Performance counter stats for 'java -jar disruptortest.jar type=tq':             
107428.492713 task-clock # 6.737 CPUs utilized
10,542 context-switches # 0.000 M/sec         
100 CPU-migrations # 0.000 M/sec    
245,909 page-faults # 0.002 M/sec     
278,182,169,187 cycles # 2.589 GHz [83.33%] 
204,478,913,414 stalled-cycles-frontend # 73.51% frontend cycles idle [83.36%]
164,497,727,638 stalled-cycles-backend # 59.13% backend cycles idle [66.73%]
90,952,113,104 instructions # 0.33 insns per cycle    
                                         # 2.25 stalled cycles per insn [83.37%]
  32,522,385,525 branches # 302.735 M/sec [83.30%]
             57,227,684 branch-misses # 0.18% of all branches [83.28%]
15.947024802 seconds time elapsed                                                      

Multiple Producer benchmark

$ perf stat java -jar disruptortest.jar type=dq producer=4                        
Producers :4, buffer size: 262144, batch:0                                      
SingleConsumerDisruptorQueue transfer rate : 2859 per ms, Used 46941m for                                           134217728                                                                        
Performance counter stats for 'java -jar disruptortest.jar type=dq producer=4':   
                 118905.839793 task-clock # 2.523 CPUs utilized                          
2,172,912 context-switches # 0.018 M/sec            
280 CPU-migrations # 0.000 M/sec    
28,697 page-faults # 0.000 M/sec    
 ​141,597,737,150 cycles # 1.191 GHz [83.18%]  
113,618,387,640 stalled-cycles-frontend # 80.24% frontend cycles idle [83.42%]
  96,562,209,060 stalled-cycles-backend # 68.19% backend cycles idle [66.86%] 
55,227,379,587 instructions # 0.39 insns per cycle    
                                         # 2.06 stalled cycles per insn [83.45%]
  9,312,400,407 branches # 78.317 M/sec [83.19%]
         64,375,263 branch-misses # 0.69% of all branches [83.35%]
47.133747893 seconds time elapsed                                          
$ perf stat java -jar disruptortest.jar type=abq producer=4                   
Producers :4, buffer size: 262144, batch:0                                
ArrayBlockingQueue transfer rate : 2047 per ms, Used 65546ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=abq producer=4':
Multiple Producer benchmark79345.046656 task-clock # 1.208 CPUs utilized                 
3,003,905 context-switches # 0.038 M/sec             
 594 CPU-migrations # 0.000 M/sec      
77,227 page-faults # 0.001 M/sec     
102,931,605,765 cycles # 1.297 GHz [83.10%]  
78,913,722,891 stalled-cycles-frontend # 76.67% frontend cycles idle [83.46%]
65,701,179,927 stalled-cycles-backend # 63.83% backend cycles idle [66.99%]
52,891,419,177 instructions # 0.51 insns per cycle     
                                        # 1.49 stalled cycles per insn [83.41%]
  9,307,141,741 branches # 117.300 M/sec [83.21%]
        79,855,221 branch-misses # 0.86% of all branches [83.23%]
65.694123910 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=lbq producer=4                     
Producers :4, buffer size: 262144, batch:0                                  
LinkedBlockingQueue transfer rate : 2795 per ms, Used 48014ms for 134217728     
Performance counter stats for 'java -jar disruptortest.jar type=lbq producer=4':
110080.375452 task-clock # 2.284 CPUs utilized  
3,644,802 context-switches # 0.033 M/sec            
597 CPU-migrations # 0.000 M/sec    
136,440 page-faults # 0.001 M/sec     
185,250,018,068 cycles # 1.683 GHz [83.46%] 
144,448,559,949 stalled-cycles-frontend # 77.97% frontend cycles idle [83.62%]
118,250,468,418 stalled-cycles-backend # 63.83% backend cycles idle [66.28%]
73,113,563,433 instructions # 0.39 insns per cycle    
                                         # 1.98 stalled cycles per insn [83.21%]
  12,028,209,235 branches # 109.268 M/sec [83.25%]
        129,234,077 branch-misses # 1.07% of all branches [83.40%]
48.189813503 seconds time elapsed                                        
$ perf stat java -jar disruptortest.jar type=tq producer=4                 
Producers :4, buffer size: 262144, batch:0                                 
LinkedTransferQueue transfer rate : 1438 per ms, Used 93273ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=tq producer=4':
761878.416668 task-clock # 8.122 CPUs utilized
71,371 context-switches # 0.000 M/sec       
203 CPU-migrations # 0.000 M/sec  
670,788 page-faults # 0.001 M/sec   
1,976,200,012,808 cycles # 2.594 GHz [83.33%] 
1,584,264,715,610 stalled-cycles-frontend # 80.17% frontend cycles idle [83.34%]
1,368,861,011,899 stalled-cycles-backend # 69.27% backend cycles idle [66.68%]
487,816,405,509 instructions # 0.25 insns per cycle   
                                           # 3.25 stalled cycles per insn [83.34%]
   169,135,278,863 branches # 221.998 M/sec [83.33%]
          615,658,238 branch-misses # 0.36% of all branches [83.33%]
93.798977802 seconds time elapsed                                                        

Conclusion

Using RingBuffer of disruptor to create a blocking queue is possible. For single producer/consumer case, it can be 5x faster than JDK default blocking queue implementation. In multiple producer case, it is much faster than arrayblocking queue and transfer queue, the linked blocking queue can achieve similar throughput but disruptor one has less context switches and less memory footprint. The only limitation is it only support the single consumer thread. The benefits for the BlockingQueue implementation on top of RingBuffer is it can be just a replacement for the existed code, and it give user more control via the BlockingQueue interface, the WorkerPool provided by disruptor only allow user to give a event handler for callback.

### Disruptor库中的RingBuffer用法与实现 #### 初始化RingBuffer 在Servlet实例化时初始化RingBuffer是一种常见做法,这可以确保在整个应用生命周期内只创建一次RingBuffer对象[^1]。 #### RingBuffer的工作机制 RingBuffer内部维护着一系列槽(slot),这些槽用于存储事件数据。为了追踪当前处理位置,“指针”被用来表示下一个可用的位置。“指针”的类型为Java `long`型(64位有符号整数), 并通过不断递增来指向新的位置[^2]。 #### 创建和配置RingBuffer 下面是一个简单的例子展示如何设置并启动一个基于Disruptor框架的RingBuffer: ```java import com.lmax.disruptor.RingBuffer; import com.lmax.disruptor.dsl.Disruptor; // 定义事件工厂 public class LongEventFactory implements EventFactory<LongEvent> { @Override public LongEvent newInstance() { return new LongEvent(); } } // 主程序入口 public static void main(String[] args) throws InterruptedException { // 设置缓冲区大小(必须是2的幂) int bufferSize = 1024; // 构建Disruptor实例 Disruptor<LongEvent> disruptor = new Disruptor<>(new LongEventFactory(), bufferSize, Executors.defaultThreadFactory()); // 启动Disruptor disruptor.start(); // 获取RingBuffer引用 RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer(); } ``` 这段代码展示了如何定义事件工厂以及怎样构建和启动Disruptor实例。这里需要注意的是,环形缓冲区(`RingBuffer`) 的容量应当设定成2的次方数以优化性能。 #### 发布事件到RingBuffer 当向RingBuffer发布新事件时,通常会经历以下几个过程: - 请求下一个序号(next sequence number) - 填充对应slot的数据 - 提交该事件(publish) 以下是具体的实现方式: ```java try { long sequence = ringBuffer.next(); // 预留一个slot try { LongEvent event = ringBuffer.get(sequence); event.setValue(valueToPublish); // 将实际值赋给event } finally { ringBuffer.publish(sequence); // 发布事件 } } catch (Exception e){ System.err.println("Error publishing event"); } ``` 此段代码说明了如何安全地获取序列号、填充事件内容并将之提交至RingBuffer中。特别注意,在操作完成后调用了publish方法以便通知消费者线程存在待处理的新消息。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值