Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)

探讨了Erlang中由于内存布局变化导致的基本获取操作性能显著下降的问题,并通过实验验证了这一现象的原因在于对象间指针距离的变化,最终提出了预取内存的解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原文地址:[url]http://www.lshift.net/blog/2010/02/28/memory-matters-even-in-erlang[/url]
[color=red]
作者解决问题的思路非常敬佩! 真没想到hibernation后, 由于对象的移动, 使得内存访问的不连续, 导致内存cahche的失效, 速度可以慢这么多![/color]

Some time ago we got an interesting bug report for RabbitMQ. Surprisingly, unlike other complex bugs, this one is easy to describe:

At some point basic.get suddenly starts being very slow - about 9 times slower!

Basic.get doesn’t do anything complex - it just pops a message from a queue. This behaviour was quite unexpected. Our initial tests confirmed that we have a problem when a queue contains thousands of elements:

queue_length: 90001 basic_get 3333 times took: 1421.250ms
queue_length: 83335 basic_get 3333 times took: 1576.664ms
queue_length: 60004 basic_get 3333 times took: 1403.086ms
queue_length: 53338 basic_get 3333 times took: 9659.434ms [ look at that! ]
queue_length: 50005 basic_get 3333 times took: 9885.598ms
queue_length: 46672 basic_get 3333 times took: 8562.136ms

Let me repeat that. Usually popping a message from a queue takes Xms. At some point, it slows down to 9*Xms.

It turned out that the problem is with the queue:len() function, which is executed during the basic.get. Actually, queue:len() calls only erlang:length() builtin. At some point it switches to the “slow” mode.

Erlang:length() is a builtin that iterates through a linked list and counts it’s length. It’s complexity is O(N), where N is the length of the list. This function is implemented in the VM so it’s expected to be very, very fast.

The problem is not with erlang:length() being slow. It’s about being unpredictably slow. Let’s take a look at Erlang interpreter source code (erl_bif_guard.c:erts_gc_length_1). Here’s the main loop for erlang:length():

i=0
while (is_list(list)) {
i++;
list = CDR(list_val(list));
}

It does nothing unusual - it just iterates through list elements. However, recompiling Erlang with some debugging information confirms that the problem is indeed here:

clock_gettime(CLOCK_REALTIME, &t0);
while (is_list(list)) {
i++;
list = CDR(list_val(list));
}
clock_gettime(CLOCK_REALTIME, &t1);
td_ms = TIMESPEC_NSEC_SUBTRACT(t1, t0) / 1000000.0;
if (i > 200000 || td_ms > 2.0) {
fprintf(stderr, "gc_length_1(%p)=%i %.3fms\n\r", reg[live], i, td_ms);
}

gc_length_1(0x7f4dbfa7fc19)=499999 2.221ms
gc_length_1(0x7f4dbfa7fc19)=499999 2.197ms
gc_length_1(0x7f4dbfa7fc19)=499999 2.208ms
(hibernation)
gc_length_1(0x7f4db0572049)=499999 13.793ms
gc_length_1(0x7f4db0572049)=499999 12.806ms
gc_length_1(0x7f4db0572049)=499999 12.531ms

This confirms Matthias’ initial guess - the slowdown starts after Erlang process hibernation.

For those who aren’t Erlang experts: Hibernation is an operation that compacts an Erlang process. It does aggressive garbage collection and reduces the memory footprint of a process to absolute minimum.

The intended result of hibernation is recovering free memory from the process. However its side effect is a new memory layout of objects allocated on the heap.

Ah, how could I have forgotten! The memory is nowadays slow! What happens, is that before hibernation list elements are aligned differently, more dense. Whereas after hibernation they are sparse. It’s easy to test it - let’s count the average distance between pointers to list elements:

gc_length_1(0x7f5c626fbc19)=499999 2.229ms avg=16.000 dev=0.023
gc_length_1(0x7f5c626fbc19)=499999 3.349ms avg=16.000 dev=0.023
gc_length_1(0x7f5c626fbc19)=499999 3.345ms avg=16.000 dev=0.023
(hibernation)
gc_length_1(0x7f5c61f7d049)=499999 13.800ms avg=136.000 dev=0.266
gc_length_1(0x7f5c61f7d049)=499999 12.726ms avg=136.000 dev=0.266
gc_length_1(0x7f5c61f7d049)=499999 12.367ms avg=136.000 dev=0.266

Confirmed! Standard deviation is surprisingly small, so we can read the numbers as:

* Before hibernation list elements are aligned exactly one after another, values are somewhere else.
* After hibernation list elements are interleaved with values.

This behavior does make sense. In most cases when you traverse the list, you actually do something with the values. After hibernation, when you access list item, the value will be already loaded to the CPU cache.

Knowing the mechanism, it’s easy to write a test case that reproduces the problem.

The average distance between pointers in my case is constant - the standard deviation is negligible. This information has a practical implication - we can “predict” where the next pointer will be. Let’s use that information to “fix” the Erlang VM by prefetching memory!

while (is_list(list)) {
i++;
list2 = CDR(list_val(list));
__builtin_prefetch((char*)list2 + 128*((long)list2-(long)list));
list = list2;
}

Test script running on original Erlang VM:

length: 300001 avg:0.888792ms dev:0.061587ms
length: 300001 avg:0.881030ms dev:0.040961ms
length: 300001 avg:0.875158ms dev:0.019436ms
hibernate
length: 300001 avg:14.861762ms dev:0.150635ms
length: 300001 avg:14.833733ms dev:0.017405ms
length: 300001 avg:14.884861ms dev:0.220119ms

Patched Erlang VM:

length: 300001 avg:0.742822ms dev:0.029322ms
length: 300001 avg:0.739149ms dev:0.012897ms
length: 300001 avg:0.739465ms dev:0.014417ms
hibernate
length: 300001 avg:7.543693ms dev:0.284355ms
length: 300001 avg:7.342802ms dev:0.330158ms
length: 300001 avg:7.265960ms dev:0.053176ms

The test runs only a tiny bit faster for the “fast” case (dense conses) and twice as fast for the “slow” case (sparse conses).

Should this patch be merged into mainline Erlang? Not really. I have set the prefetch multiplier value to 128 and I don’t even know if it’s optimal. This was only an experiment. But it was fun to see how low-level system architecture can affect high-level applications.
标题基于SpringBoot+Vue的社区便民服务平台研究AI更换标题第1章引言介绍社区便民服务平台的研究背景、意义,以及基于SpringBoot+Vue技术的研究现状和创新点。1.1研究背景与意义分析社区便民服务的重要性,以及SpringBoot+Vue技术在平台建设中的优势。1.2国内外研究现状概述国内外在社区便民服务平台方面的发展现状。1.3研究方法与创新点阐述本文采用的研究方法和在SpringBoot+Vue技术应用上的创新之处。第2章相关理论介绍SpringBoot和Vue的相关理论基础,以及它们在社区便民服务平台中的应用。2.1SpringBoot技术概述解释SpringBoot的基本概念、特点及其在便民服务平台中的应用价值。2.2Vue技术概述阐述Vue的核心思想、技术特性及其在前端界面开发中的优势。2.3SpringBoot与Vue的整合应用探讨SpringBoot与Vue如何有效整合,以提升社区便民服务平台的性能。第3章平台需求分析与设计分析社区便民服务平台的需求,并基于SpringBoot+Vue技术进行平台设计。3.1需求分析明确平台需满足的功能需求和性能需求。3.2架构设计设计平台的整体架构,包括前后端分离、模块化设计等思想。3.3数据库设计根据平台需求设计合理的数据库结构,包括数据表、字段等。第4章平台实现与关键技术详细阐述基于SpringBoot+Vue的社区便民服务平台的实现过程及关键技术。4.1后端服务实现使用SpringBoot实现后端服务,包括用户管理、服务管理等核心功能。4.2前端界面实现采用Vue技术实现前端界面,提供友好的用户交互体验。4.3前后端交互技术探讨前后端数据交互的方式,如RESTful API、WebSocket等。第5章平台测试与优化对实现的社区便民服务平台进行全面测试,并针对问题进行优化。5.1测试环境与工具介绍测试
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值