Disruptor Ring Buffer as a Blocking Queue

最新推荐文章于 2021-10-26 08:30:00 发布

ebay

最新推荐文章于 2021-10-26 08:30:00 发布

阅读量1.6k

点赞数

CC 4.0 BY-SA版权

分类专栏：平台文章标签： buffer

本文链接：https://blog.youkuaiyun.com/ebay/article/details/43528915

平台专栏收录该内容

33 篇文章

订阅专栏

介绍了一种基于Disruptor环形缓冲区实现的高性能单消费者阻塞队列，并通过基准测试对比了其与JDK中ArrayBlockingQueue等队列的性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Author: Wang, Xinglang

Abstract

For any concurrent multi-threaded system, distributed computing or otherwise,the inter-thread messaging component is an very important component. In Java, the JDK provided
ArrayBlockingQueue, LinkedBlockingQueue, TransferQueue. And Disruptor (http://lmaxexchange.github.io/disruptor/)
is very famous based on its high performance on its inter-thread messaging, but it does not expose as a BlockingQueue. This blog will introduce a new Blocking Queue based on its ring buffer and also with a benchmark result.

Why require Blocking Queue interface

Blocking queue interface is widely used by existed code, changing to Disruptor directly will cause big changes since disruptor want to control the whole thread scheduling. Second, Disruptor only call back when there is an event arrived, but it does not have a chance to let the application control the behavior when the queue is built-up and do some pro-active throttling.This blog will introduce a BlockingQueue implementation on top of RingBuffer, but there is a limitation,this queue can only be consumed by one consumer thread, but for producer, it can be single or multiple producer thread. This will be useful for the Actor Pattern, which use a blocking queue and one thread to drain queue. The reason is the offset of the consumer side can be hard to maintain if there are multiple consumer threads, multiple thread consumers should use Disruptor WorkerPool to replace the JDK Executor.

Implementation

The source code is available on
Github:https://github.com/xinglang/disruptorqueue/tree/master/disruptorqueue
Since this queue only supports one consumer, so let's call it SingleConsumerDisruptorQueue
The SingleConsumerDisruptorQueue will have a ring buffer and a sequence (consumedSeq) for the
cosnumer, the cosnumedSeq will be the gating sequence of the ring buffer. And there a knownPublishedSeq which used to remember the last known published sequence. Since it will be a
blocking queue, so the wait strategy will be BlockingWaitStrategy (Default one).

private final RingBuffer<Event<T>> ringBuffer;

private final Sequence consumedSeq;

private final SequenceBarrier barrier;

private long knownPublishedSeq;

public SingleConsumerDisruptorQueue(int bufferSize, boolean singleProducer) {

if (singleProducer) {

ringBuffer = RingBuffer.createSingleProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

} else {

ringBuffer = RingBuffer.createMultiProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

}

consumedSeq = new Sequence();

ringBuffer.addGatingSequences(consumedSeq);

barrier = ringBuffer.newBarrier();

long cursor = ringBuffer.getCursor();

consumedSeq.set(cursor);

knownPublishedSeq = cursor;

}

For the publish, just use ring buffer publish. And inside the ring buffer, there is a event holder which
acts as a value holder of the item.

@Override
public boolean offer(T e) {
long seq;
try {
seq = ringBuffer.tryNext();
} catch (InsufficientCapacityException e1) {
return false;
}
publish(e, seq);
return true;
}
private void publish(T e, long seq) {
Event<T> holder = ringBuffer.get(seq);
holder.setValue(e);
ringBuffer.publish(seq);
}

For the consume, there is a optimization since only one consumer thread. Each time when call the waitFor, it can get the last known published sequence, if the consumer sequence less than the last known published sequence, it does not need call the barrier waitFor method.

@Override

public T take() throws InterruptedException {
long l = consumedSeq.get() + 1;
while (knownPublishedSeq < l) {
try {
knownPublishedSeq = barrier.waitFor(l);
} catch (AlertException e) {
throw new IllegalStateException(e);
} catch (TimeoutException e) {
throw new IllegalStateException(e);
}
}
Event<T> eventHolder = ringBuffer.get(l);
consumedSeq.incrementAndGet();
return eventHolder.getValue();
}

Performace analysis

First of all, it can get all benefits from the ring buffer design:

Avoid false sharing
Pre-allocated ring buffer, no any instance created during publish/consume
Less context switch, the consumer can get a batch of events without interrupted

Below is a benchmark for the queue and LinkedBlockingQueue, ArrayBlockingQueue and Transfer Queue. The Benchmark run on a baremetal machine with Ubuntu, the benchmark use 1 consumer thread, and 1 to 4 producer thread, each round run 32M put/take, the object for put is a constant string, so there is no any GC overhead for the object creation.

Single Producer benchmark

$ perf stat java -jar disruptortest.jar type=dbq                          
Producers :1, buffer size: 262144, batch:0                                
SingleConsumerDisruptorQueue transfer rate : 19890 per ms, Used 1687ms for 33554432                                                                  
Performance counter stats for 'java -jar disruptortest.jar type=dbq':     
3729.421847 task-clock # 1.998 CPUs utilized   
1,891 context-switches # 0.001 M/sec           
                      76 CPU-migrations # 0.000 M/sec                            
9,357 page-faults # 0.003 M/sec      
9,434,280,791 cycles # 2.530 GHz [83.38%]  
5,489,619,603 stalled-cycles-frontend # 58.19% frontend cycles idle [83.35%] 
2,618,037,087 stalled-cycles-backend # 27.75% backend cycles idle [66.99%] 
10,797,968,145 instructions # 1.14 insns per cycle       
                                      # 0.51 stalled cycles per insn [83.55%]
1,742,973,721 branches # 467.358 M/sec [83.28%]
      10,213,770 branch-misses # 0.59% of all branches [83.12%]
1.866803438 seconds time elapsed   
            
$ perf stat java -jar disruptortest.jar type=abq                                 
Producers :1, buffer size: 262144, batch:0                                      
ArrayBlockingQueue transfer rate : 2694 per ms, Used 12451ms for 33554432    
Performance counter stats for 'java -jar disruptortest.jar type=abq':
22976.952946 task-clock # 1.824 CPUs utilized  
232,766 context-switches # 0.010 M/sec           
80 CPU-migrations # 0.000 M/sec    
68,531 page-faults # 0.003 M/sec     
58,643,663,103 cycles # 2.552 GHz [83.14%] 
51,767,105,241 stalled-cycles-frontend # 88.27% frontend cycles idle [83.32%]
47,084,355,024 stalled-cycles-backend # 80.29% backend cycles idle [66.51%]
   12,035,035,540 instructions # 0.21 insns per cycle        
                                        # 4.30 stalled cycles per insn [83.44%]
 2,016,738,256 branches # 87.772 M/sec [83.56%]
        20,147,764 branch-misses # 1.00% of all branches [83.49%]
12.596555382 seconds time elapsed                                         
$ perf stat java -jar disruptortest.jar type=lbq                                  
Producers :1, buffer size: 262144, batch:0                                        
LinkedBlockingQueue transfer rate : 1132 per ms, Used 29632ms for 33554432          
Performance counter stats for 'java -jar disruptortest.jar type=lbq':             
58707.942294 task-clock # 1.968 CPUs utilized 
82,377 context-switches # 0.001 M/sec         
97 CPU-migrations # 0.000 M/sec   
133,543 page-faults # 0.002 M/sec     
151,825,969,348 cycles # 2.586 GHz [83.27%] 
139,833,905,165 stalled-cycles-frontend # 92.10% frontend cycles idle [83.40%]
131,712,244,095 stalled-cycles-backend # 86.75% backend cycles idle [66.67%]
10,997,843,405 instructions # 0.07 insns per cycle    
                                          # 12.71 stalled cycles per insn [83.26%]
  1,701,879,665 branches # 28.989 M/sec [83.31%]
         23,369,660 branch-misses # 1.37% of all branches [83.35%]
29.830928757 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=tq                                      
Producers :1, buffer size: 262144, batch:0                                       
LinkedTransferQueue transfer rate : 2139 per ms, Used 15685ms for 33554432       
Performance counter stats for 'java -jar disruptortest.jar type=tq':             
107428.492713 task-clock # 6.737 CPUs utilized
10,542 context-switches # 0.000 M/sec         
100 CPU-migrations # 0.000 M/sec    
245,909 page-faults # 0.002 M/sec     
278,182,169,187 cycles # 2.589 GHz [83.33%] 
204,478,913,414 stalled-cycles-frontend # 73.51% frontend cycles idle [83.36%]
164,497,727,638 stalled-cycles-backend # 59.13% backend cycles idle [66.73%]
90,952,113,104 instructions # 0.33 insns per cycle    
                                         # 2.25 stalled cycles per insn [83.37%]
  32,522,385,525 branches # 302.735 M/sec [83.30%]
             57,227,684 branch-misses # 0.18% of all branches [83.28%]
15.947024802 seconds time elapsed

Multiple Producer benchmark

$ perf stat java -jar disruptortest.jar type=dq producer=4                        
Producers :4, buffer size: 262144, batch:0                                      
SingleConsumerDisruptorQueue transfer rate : 2859 per ms, Used 46941m for                                           134217728                                                                        
Performance counter stats for 'java -jar disruptortest.jar type=dq producer=4':   
                 118905.839793 task-clock # 2.523 CPUs utilized                          
2,172,912 context-switches # 0.018 M/sec            
280 CPU-migrations # 0.000 M/sec    
28,697 page-faults # 0.000 M/sec    
 141,597,737,150 cycles # 1.191 GHz [83.18%]  
113,618,387,640 stalled-cycles-frontend # 80.24% frontend cycles idle [83.42%]
  96,562,209,060 stalled-cycles-backend # 68.19% backend cycles idle [66.86%] 
55,227,379,587 instructions # 0.39 insns per cycle    
                                         # 2.06 stalled cycles per insn [83.45%]
  9,312,400,407 branches # 78.317 M/sec [83.19%]
         64,375,263 branch-misses # 0.69% of all branches [83.35%]
47.133747893 seconds time elapsed                                          
$ perf stat java -jar disruptortest.jar type=abq producer=4                   
Producers :4, buffer size: 262144, batch:0                                
ArrayBlockingQueue transfer rate : 2047 per ms, Used 65546ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=abq producer=4':
Multiple Producer benchmark79345.046656 task-clock # 1.208 CPUs utilized                 
3,003,905 context-switches # 0.038 M/sec             
 594 CPU-migrations # 0.000 M/sec      
77,227 page-faults # 0.001 M/sec     
102,931,605,765 cycles # 1.297 GHz [83.10%]  
78,913,722,891 stalled-cycles-frontend # 76.67% frontend cycles idle [83.46%]
65,701,179,927 stalled-cycles-backend # 63.83% backend cycles idle [66.99%]
52,891,419,177 instructions # 0.51 insns per cycle     
                                        # 1.49 stalled cycles per insn [83.41%]
  9,307,141,741 branches # 117.300 M/sec [83.21%]
        79,855,221 branch-misses # 0.86% of all branches [83.23%]
65.694123910 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=lbq producer=4                     
Producers :4, buffer size: 262144, batch:0                                  
LinkedBlockingQueue transfer rate : 2795 per ms, Used 48014ms for 134217728     
Performance counter stats for 'java -jar disruptortest.jar type=lbq producer=4':
110080.375452 task-clock # 2.284 CPUs utilized  
3,644,802 context-switches # 0.033 M/sec            
597 CPU-migrations # 0.000 M/sec    
136,440 page-faults # 0.001 M/sec     
185,250,018,068 cycles # 1.683 GHz [83.46%] 
144,448,559,949 stalled-cycles-frontend # 77.97% frontend cycles idle [83.62%]
118,250,468,418 stalled-cycles-backend # 63.83% backend cycles idle [66.28%]
73,113,563,433 instructions # 0.39 insns per cycle    
                                         # 1.98 stalled cycles per insn [83.21%]
  12,028,209,235 branches # 109.268 M/sec [83.25%]
        129,234,077 branch-misses # 1.07% of all branches [83.40%]
48.189813503 seconds time elapsed                                        
$ perf stat java -jar disruptortest.jar type=tq producer=4                 
Producers :4, buffer size: 262144, batch:0                                 
LinkedTransferQueue transfer rate : 1438 per ms, Used 93273ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=tq producer=4':
761878.416668 task-clock # 8.122 CPUs utilized
71,371 context-switches # 0.000 M/sec       
203 CPU-migrations # 0.000 M/sec  
670,788 page-faults # 0.001 M/sec   
1,976,200,012,808 cycles # 2.594 GHz [83.33%] 
1,584,264,715,610 stalled-cycles-frontend # 80.17% frontend cycles idle [83.34%]
1,368,861,011,899 stalled-cycles-backend # 69.27% backend cycles idle [66.68%]
487,816,405,509 instructions # 0.25 insns per cycle   
                                           # 3.25 stalled cycles per insn [83.34%]
   169,135,278,863 branches # 221.998 M/sec [83.33%]
          615,658,238 branch-misses # 0.36% of all branches [83.33%]
93.798977802 seconds time elapsed

Conclusion

Using RingBuffer of disruptor to create a blocking queue is possible. For single producer/consumer case, it can be 5x faster than JDK default blocking queue implementation. In multiple producer case, it is much faster than arrayblocking queue and transfer queue, the linked blocking queue can achieve similar throughput but disruptor one has less context switches and less memory footprint. The only limitation is it only support the single consumer thread. The benefits for the BlockingQueue implementation on top of RingBuffer is it can be just a replacement for the existed code, and it give user more control via the BlockingQueue interface, the WorkerPool provided by disruptor only allow user to give a event handler for callback.