Go1.24 新特性：自旋互斥 lock2 优化，性能有一定提高！

最新推荐文章于 2025-03-06 16:45:38 发布

煎鱼（EDDYCJY）

最新推荐文章于 2025-03-06 16:45:38 发布

阅读量1.1k

点赞数 15

本文链接：https://blog.youkuaiyun.com/EDDYCJY/article/details/145272722

版权

大家好，我是煎鱼。

除了上次跟大家提到的 map 使用 Swiss Table 来替换 Hashmap 的原始实现以外。本次 Go1.24 新版本还带来了更多的有效优化。

今天这篇文章将继续和大家一起学习自旋互斥 lock2 优化。

背景

提案作者 @Rhys Hiltner 在 2024 年提出了改进互斥锁的性能优化诉求：

其个人对于 runtime.mutex 值的部分经验是：整个进程会因为对单个 mutex 的需求使得整个程序缓慢运行。

我不认为这一点会让人感到意外，尽管速度减慢的程度超出了我的预期。主要的惊喜在于，程序一旦跌落性能悬崖，就很难再恢复过来。

性能测试

在基准测试 ChanContended 中，作者发现随着 GOMAXPROCS 的增加，mutex 的性能明显下降。

Intel i7-13700H (linux/amd64)：
- 当允许使用 4 个线程时，整个进程的吞吐量是单线程时的一半。
- 当允许使用 8 个线程时，吞吐量再次减半。
- 当允许使用 12 个线程时，吞吐量再次减半。
- 在 GOMAXPROCS=20 时，200 次通道操作平均耗时 44 微秒，平均每 220 纳秒调用一次 unlock2，每次都有机会唤醒一个睡眠线程。
M1 MacBook Air (darwin/arm64)：
- 当允许使用 5 个线程时，吞吐量不到单线程时的一半。

另一个角度是考虑进程的 CPU 占用时间。

下面的数据显示，在 1.78 秒的挂钟时间内，进程的 20 个线程在 lock2 调用中总共有 27.74 秒处于 CPU 上。

如下测试报告：

$ go test runtime -test.run='^$' -test.bench=ChanContended -test.cpu=20 -test.count=1 -test.cpuprofile=/tmp/p
goos: linux
goarch: amd64
pkg: runtime
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
BenchmarkChanContended-20        26667      44404 ns/op
PASS
ok   runtime 1.785s

$ go tool pprof -peek runtime.lock2 /tmp/p
File: runtime.test
Type: cpu
Time: Jul 24, 2024 at 8:45pm (UTC)
Duration: 1.78s, Total samples = 31.32s (1759.32%)
Showing nodes accounting for 31.32s, 100% of 31.32s total
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context
----------------------------------------------------------+-------------
                                            27.74s   100% |   runtime.lockWithRank
     4.57s 14.59% 14.59%     27.74s 88.57%                | runtime.lock2
                                            19.50s 70.30% |   runtime.procyield
                                             2.74s  9.88% |   runtime.futexsleep
                                             0.84s  3.03% |   runtime.osyield
                                             0.07s  0.25% |   runtime.(*lockTimer).begin
                                             0.02s 0.072% |   runtime.(*lockTimer).end
----------------------------------------------------------+-------------

关键问题之一：这些 lock2 相关的线程并没有休眠，而是一直在自旋！