【SysBench】sysbench 测试 CPU、文件 I/O、内存、线程和 POSIX 互斥

独上西楼影三人

已于 2024-03-21 13:15:22 修改

阅读量2.1k

点赞数 24

分类专栏： SysBench 基准测试文章标签： sysbench 基准测试 benchmark

于 2024-03-20 14:59:40 首次发布

本文链接：https://blog.youkuaiyun.com/wudi53433927/article/details/136715279

版权

SysBench 同时被 2 个专栏收录

5 篇文章

订阅专栏

基准测试

5 篇文章

订阅专栏

测试环境说明

笔者使用的是搭建在笔记本电脑（性能很差）上的虚拟机，笔记本电脑使用的物理磁盘是 HDD 机械硬盘。该虚拟机拥有 2 个 CPU 核心、3.7G 物理内存、2G swap 区和虚拟化的机械硬盘。

以下测试结果可能存在偏差、失真的情况，不足以得出确切的结论。

1、CPU

CPU 基准测试使用 64 位整数，测试计算质数（或素数）直到某个最大值所需要的时间。使用 --cpu-max-prime=N 选项指定质数生成器的上限，默认为 10000 。

先查看一下 CPU 信息：

$ cat /proc/cpuinfo

在这里插入图片描述

可以看到，笔者的测试虚拟机有两个 CPU 核心。

测试 CPU 性能，修改最大质数为 20000 。

$ sysbench cpu --cpu-max-prime=20000 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 20000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   230.80

General statistics:
    total time:                          10.0040s
    total number of events:              2327

Latency (ms):
         min:                                    4.16
         avg:                                    4.30
         max:                                    5.52
         95th percentile:                        4.41
         sum:                                 9999.46

Threads fairness:
    events (avg/stddev):           2327.0000/0.00
    execution time (avg/stddev):   9.9995/0.00

2、文件 I/O

先查看一下磁盘的基本信息：

$ fdisk -l

磁盘 /dev/sda：53.7 GB, 53687093000 字节，104857600 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理)：512 字节 / 512 字节
I/O 大小(最小/最佳)：512 字节 / 512 字节
磁盘标签类型：dos
磁盘标识符：0x000b4260

   设备 Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200   104857599    51379200   83  Linux

磁盘 /dev/mapper/centos-root：50.5 GB, 50457477300 字节，98549760 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理)：512 字节 / 512 字节
I/O 大小(最小/最佳)：512 字节 / 512 字节


磁盘 /dev/mapper/centos-swap：2147 MB, 2147483648 字节，4194304 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理)：512 字节 / 512 字节
I/O 大小(最小/最佳)：512 字节 / 512 字节

$ lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   50G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   49G  0 part 
  ├─centos-root 253:0    0   47G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sr0              11:0    1 1024M  0 rom  
$

可以看到扇区单元大小为 512 字节。故下面测试需要指定 --file-block-size=512 。该选项默认值为 16384 。

测试分为两个阶段，首先是准备（prepare）阶段。测试写入时不需要准备阶段。通过下面的命令创建一个数据集：

sysbench fileio --file-total-size=1G prepare

这个命令会在当前工作目录下创建测试文件。

然后是运行（run）阶段将通过读写这些文件进行测试。针对不同的I/O类型有不同的测试选项：

seqwr
顺序写入。
seqrewr
顺序重写。其中 rewr 代表 “rewrite”。
seqrd
顺序读取。
rndrd
随机读取。
rndwr
随机写入。
rdnrw
混合随机读/写。

测试时如使用默认值，请注意 fileio 的默认配置：

fileio options:
  --file-num=N                  number of files to create [128]
  --file-block-size=N           block size to use in all IO operations [16384]
  --file-total-size=SIZE        total size of files to create [2G]
  --file-test-mode=STRING       test mode {seqwr, seqrewr, seqrd, rndrd, rndwr, rndrw}
  --file-io-mode=STRING         file operations mode {sync,async,mmap} [sync]
  --file-async-backlog=N        number of asynchronous operatons to queue per thread [128]
  --file-extra-flags=[LIST,...] list of additional flags to use to open files {sync,dsync,direct} []
  --file-fsync-freq=N           do fsync() after this number of requests (0 - don't use fsync()) [100]
  --file-fsync-all[=on|off]     do fsync() after each write operation [off]
  --file-fsync-end[=on|off]     do fsync() at the end of test [on]
  --file-fsync-mode=STRING      which method to use for synchronization {fsync, fdatasync} [fsync]
  --file-merged-requests=N      merge at most this number of IO requests if possible (0 - don't merge) [0]
  --file-rw-ratio=N             reads/writes ratio for combined test [1.5]

以下所有测试配置的基础选项为：测试文件总大小为 10G，文件总数为 2 ，线程数为 2，测试最大时间为 300s （建议测试时间足够长以防时间到达时指定大小的文件未生成完整），模拟两个客户端的文件 I/O 请求。

2.1 文件块大小为 512B

因为是测试写入，所以没必要 prepare 。

2.1.1 测试顺序写入 `seqwr`

下面将逐步对一些选项进行分别测试，以找到最佳性能。测试指标顺序写入主要看吞吐量。

1B：其他选线均使用默认值

标题中“B”代表 “Baseline”。

注意此时默认值 --file-fsync-end=on 。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=seqwr run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     72938.60
    fsyncs/s:                     1458.78

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               35.61

General statistics:
    total time:                          300.0256s
    total number of events:              22330893

Latency (ms):
         min:                                    0.00
         avg:                                    0.03
         max:                                  762.05
         95th percentile:                        0.01
         sum:                               590498.93

Threads fairness:
    events (avg/stddev):           11165446.5000/15403.50
    execution time (avg/stddev):   295.2495/0.01

写入吞吐量为 35.61 MB/s 。

另外，在测试过程中发现两个线程并不是并发地分别写入到两个文件中，而是先创建两个空文件，然后写第一个，再写第二个。如果写完第二个文件时仍未到达 time 选项指定的时间，则会覆写第一个文件，以此循环。写入过程中会不断更新文件的元数据，比如文件大小、修改时间等。

在这里插入图片描述

1T1：使用 `async`

标题中“T”代表 “Test”。

基于 1B 测试 --file-io-mode=async 时的性能变化。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=seqwr run --file-io-mode=async
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using asynchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     69716.98
    fsyncs/s:                     1394.35

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               34.04

General statistics:
    total time:                          300.3779s
    total number of events:              21401348

Latency (ms):
         min:                                    0.00
         avg:                                    0.03
         max:                                 1544.58
         95th percentile:                        0.01
         sum:                               591189.81

Threads fairness:
    events (avg/stddev):           10700674.0000/8615.00
    execution time (avg/stddev):   295.5949/0.00

写入吞吐量为 34.04 MB/s ，对比 1B 性能下降了。

1T2：使用 `file-fsync-freq`

file-sync-freq 使用默认值 100 。

1T2-1：基于 1B

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=seqwr run --file-fsync-freq=1000
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Periodic FSYNC enabled, calling fsync() each 1000 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     72619.56
    fsyncs/s:                     145.25

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               35.46

General statistics:
    total time:                          300.0326s
    total number of events:              21838262

Latency (ms):
         min:                                    0.00
         avg:                                    0.03
         max:                                 1068.54
         95th percentile:                        0.01
         sum:                               591649.79

Threads fairness:
    events (avg/stddev):           10919131.0000/58361.00
    execution time (avg/stddev):   295.8249/0.01

写入吞吐量为 35.46 MB/s ，性能略有下降。

1T2-2：基于 1T1

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=seqwr run --file-fsync-freq=10000 --file-io-mode=async
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Periodic FSYNC enabled, calling fsync() each 1000 requests.
Calling fsync() at the end of test, Enabled.
Using asynchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     70541.87
    fsyncs/s:                     141.09

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               34.44

General statistics:
    total time:                          300.0306s
    total number of events:              21215034

Latency (ms):
         min:                                    0.00
         avg:                                    0.03
         max:                                  730.71
         95th percentile:                        0.01
         sum:                               592107.01

Threads fairness:
    events (avg/stddev):           10607517.0000/215686.00
    execution time (avg/stddev):   296.0535/0.08

写入吞吐量为 34.44 MB/s ，对比 1T1 性能提升了；但对比 1T-2 性能仍下降了。由此可见，顺序写入并不能从异步 I/O async 获益，甚至会引起性能下降。

2.1.2 测试随机读写 `rndrw`

因应用尤其是数据库的 I/O 瓶颈通常为随机 I/O （特别是在使用传统机械硬盘作为数据存储时），故以下重点对随机读写进行测试。

随机 I/O 的测试指标与顺序 I/O 不同，更看重次数，即每秒 IO 操作次数（IOPS）。 有关这点，查看机械、SSD 硬盘的产品规格即可印证。而这在 sysbench-1.20 的输出中没有直接显示，需要自行计算，计算公式为 "total number of events" / "time" ，time 在本例中为 300 。为了简便，我们直接比较 total number of events ，后文称之为 “总请求数”。

此外，额外使用“延迟”中的 95th percentile 作为另一个测试指标，以反应大部分请求的响应时间。后文将以“响应时间”代指它。

测试 rndrw 需要 prepare ：

$ sysbench --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 prepare

2B：其他选项均使用默认值

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      58.42
    writes/s:                     38.95
    fsyncs/s:                     1.96

Throughput:
    read, MiB/s:                  0.03
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0184s
    total number of events:              29797

Latency (ms):
         min:                                    0.00
         avg:                                   20.13
         max:                                  988.55
         95th percentile:                       59.99
         sum:                               599958.77

Threads fairness:
    events (avg/stddev):           14898.5000/97.50
    execution time (avg/stddev):   299.9794/0.01

total number of events: 29797
avg: 20.13
95th percentile: 59.99

2T1：`file-fsync-all=on`

file-fsync-all 选项默认为 off ，开启它表示每次写操作后都做一次 fsync 。

file-fsync-all=on ：

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-all=on
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() after each write operation.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      56.92
    writes/s:                     37.94
    fsyncs/s:                     37.94

Throughput:
    read, MiB/s:                  0.03
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0233s
    total number of events:              28461

Latency (ms):
         min:                                    0.00
         avg:                                   21.08
         max:                                  961.86
         95th percentile:                       57.87
         sum:                               599971.97

Threads fairness:
    events (avg/stddev):           14230.5000/3.50
    execution time (avg/stddev):   299.9860/0.01

total number of events: 28461
avg: 21.08
95th percentile: 57.87

对比 2B，总请求数略有减少，但平均响应时间、95%响应时间却略有减少。性能略有下降。

2T2：调整 `file-fsync-freq`

2T2-1：减小 `file-fsync-freq`

file-fsync-freq 选项默认为 100 ，表示每多少次请求后做一次 fsync 。设置 file-fsync-freq=1 并进行如下测试。

正常来讲，这个配置的 I/O 性能应该是最差的，因为每次请求都调用一次 fsync 刷入磁盘，而 file-fsync-freq > 1 时可以在换从中合并多个随机 I/O 为顺序 I/O，提高读写 I/O 性能。可笔者“实际”测试却恰恰相反，初步怀疑是操作系统内核、默认配置、虚拟机默认的磁盘配置、内存过小等原因造成的，暂不予深究，之后有时间再行分析。

file-fsync-freq=1：

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=1
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      51.09
    writes/s:                     34.06
    fsyncs/s:                     68.14

Throughput:
    read, MiB/s:                  0.02
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0141s
    total number of events:              45989

Latency (ms):
         min:                                    0.00
         avg:                                   13.04
         max:                                  960.19
         95th percentile:                       38.25
         sum:                               599925.83

Threads fairness:
    events (avg/stddev):           22994.5000/204.50
    execution time (avg/stddev):   299.9629/0.01

total number of events: 45989
avg: 13.04
95th percentile: 38.25

对比 2B，总请求数显著增加，平均响应时间、95%响应时间也显著减少。性能明显提升。但每秒文件读写操作数却减少了。

注意此时 fsync 次数约为每秒 1 次，符合 InnoDB 设置 innodb_flush_log_at_trx_commit=0 且 innodb-flush-log-at-timeout 为默认值 1 时的情况。

2T2-2：增加 `file-fsync-freq`

基于 2B 修改：

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=150
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 150 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      45.53
    writes/s:                     30.35
    fsyncs/s:                     1.02

Throughput:
    read, MiB/s:                  0.02
    written, MiB/s:               0.01

General statistics:
    total time:                          300.0069s
    total number of events:              23068

Latency (ms):
         min:                                    0.00
         avg:                                   26.01
         max:                                 1067.99
         95th percentile:                       74.46
         sum:                               599921.35

Threads fairness:
    events (avg/stddev):           11534.0000/54.00
    execution time (avg/stddev):   299.9607/0.00

IOPS 如下：

total number of events: 23068
avg: 26.01
95th percentile: 74.46

相比 2B，总请求数显著减少，响应时间显著增加。性能显著下降了。

注意，file-fsync-freq 不能与 file-fsync-all 一起使用，否则会被忽略。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-all=on --file-fsync-freq=40
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() after each write operation.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!

可以看到，执行信息里没有诸如 Periodic FSYNC enabled, calling fsync() each 40 requests. 这样的内容，只有 Calling fsync() after each write operation. 。证明只有 file-fsync-all 生效了。

2T3：使用 `async`

测试至此，2T2-1 为最优结果，故下面以它为参照。

基于 2T2-1 ，测试 --file-io-mode=async 时的性能变化。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=1 --file-io-mode=async
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using asynchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      43.45
    writes/s:                     29.30
    fsyncs/s:                     58.62

Throughput:
    read, MiB/s:                  0.02
    written, MiB/s:               0.01

General statistics:
    total time:                          300.2207s
    total number of events:              39437

Latency (ms):
         min:                                    0.00
         avg:                                   15.22
         max:                                  904.93
         95th percentile:                       56.84
         sum:                               600336.15

Threads fairness:
    events (avg/stddev):           19718.5000/94.50
    execution time (avg/stddev):   300.1681/0.00

total number of events: 39437
avg: 15.22
95th percentile: 56.84

对比 2T2-1，总请求数显著减少，平均和%95响应时间显著增加。性能显著下降了。

2T4：调整 `file-async-backlog`

file-async-backlog 表示每个线程要排队的异步操作数，默认值为 128 ，似乎与 file-num 的默认值 128 对应。所以，下面我们尝试修改它为 2，看看结果如何。

基于 2T3 ：

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=1 --file-io-mode=async --file-async-backlog=2
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using asynchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      46.87
    writes/s:                     31.25
    fsyncs/s:                     62.50

Throughput:
    read, MiB/s:                  0.02
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0433s
    total number of events:              42189

Latency (ms):
         min:                                    0.00
         avg:                                   14.22
         max:                                  999.98
         95th percentile:                       46.63
         sum:                               599961.22

Threads fairness:
    events (avg/stddev):           21094.5000/40.50
    execution time (avg/stddev):   299.9806/0.01

total number of events: 42189
avg: 14.22
95th percentile: 46.63
对比 2T3，总请求数略有增加，平均响应时间略有减少，95%响应时间显著减少。性能略有上升。
对比 2T2-1，总请求数显著减少，平均响应时间略有增加，95%响应时间显著增加。性能显著下降。

证明使用异步 IO 并未引起性能提升，反而下降了。

2T5：使用 `file-merged-requests`

此时，2T2-1 仍为最优，故仍以此为基线，尝试进一步提升性能。2T2-1 的测试结果中 45989 / 300 ≈ 153.3，故我们设置 file-merged-requests=160 。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=1 --file-merged-requests=160
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Merging requests up to 80KiB for sequential IO.
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      40.77
    writes/s:                     27.18
    fsyncs/s:                     54.37

Throughput:
    read, MiB/s:                  0.02
    written, MiB/s:               0.01

General statistics:
    total time:                          300.0027s
    total number of events:              36694

Latency (ms):
         min:                                    0.00
         avg:                                   16.35
         max:                                 1058.73
         95th percentile:                       54.83
         sum:                               599910.22

Threads fairness:
    events (avg/stddev):           18347.0000/12.00
    execution time (avg/stddev):   299.9551/0.00

total number of events: 36694
avg: 16.35
95th percentile: 54.83

对比 2T2-1，虽然可以看到 Merging requests up to 80KiB for sequential IO. 将多个随机 I/O 合并为顺序 I/O，但总请求数仍显著减少，平均响应时间显著增加，95%响应时间也显著增加。性能显著降低。

2T6：使用 `fdatasync`

2T6-1：基于 2B

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-mode=fdatasync
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      57.40
    writes/s:                     38.27
    fsyncs/s:                     1.93

Throughput:
    read, MiB/s:                  0.03
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0671s
    total number of events:              29281

Latency (ms):
         min:                                    0.00
         avg:                                   20.49
         max:                                  990.92
         95th percentile:                       62.19
         sum:                               600055.06

Threads fairness:
    events (avg/stddev):           14640.5000/26.50
    execution time (avg/stddev):   300.0275/0.00

total number of events: 29281
avg: 20.49
95th percentile: 62.19

对比 2B，总请求数略微减少，平均响应时间略微增加，响应时间略有增加。性能稍微降低了。

2T6-2：基于 2T2-1

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=512 --file-test-mode=rndrw run --file-fsync-freq=1 --file-fsync-mode=fdatasync
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 512B
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      51.88
    writes/s:                     34.59
    fsyncs/s:                     69.19

Throughput:
    read, MiB/s:                  0.03
    written, MiB/s:               0.02

General statistics:
    total time:                          300.0144s
    total number of events:              46698

Latency (ms):
         min:                                    0.00
         avg:                                   12.85
         max:                                  960.98
         95th percentile:                       47.47
         sum:                               599922.94

Threads fairness:
    events (avg/stddev):           23349.0000/23.00
    execution time (avg/stddev):   299.9615/0.00

total number of events: 46698
avg: 12.85
95th percentile: 47.47

对比 2T2-1，总请求数略有增加，平均响应时间稍微减少，95%响应时间显著增加。性能略有提升。至此，rndrw 测试中本测试结果为最佳。

2.2 文件块大小为 16K

接下来仅对上面测试的最优的随机读写配置再做优化测试。

2.2.1 测试顺序写入 `seqwr`

测试内存块大小为 16KB 时的内存读写性能。

3T1：对比 1B

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=16K --file-test-mode=seqwr run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 16KiB
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     2379.17
    fsyncs/s:                     47.59

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               37.17

General statistics:
    total time:                          300.0183s
    total number of events:              728075

Latency (ms):
         min:                                    0.00
         avg:                                    0.82
         max:                                 3171.58
         95th percentile:                        0.05
         sum:                               599503.06

Threads fairness:
    events (avg/stddev):           364037.5000/811.50
    execution time (avg/stddev):   299.7515/0.00

对比 1B，写入吞吐量为 37.17 MB/s，性能略有提升。但平均和95%响应时间却显著增加了。

2.2.2 测试随机读写 `rndrw`

3T2：对比 2T6-2

准备一个块大小为 16K 的测试文件。

$ sysbench --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=16K prepare

也可以不使用 --file-block-size=16K ，因为 16K 本就是该选项的默认值。

$ sysbench --time=300 --threads=2 fileio --file-total-size=10G --file-num=2 --file-block-size=16K --file-test-mode=rndrw run --file-fsync-freq=1 --file-fsync-mode=fdatasync
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Extra file open flags: (none)
2 files, 5GiB each
10GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      65.39
    writes/s:                     43.59
    fsyncs/s:                     87.20

Throughput:
    read, MiB/s:                  1.02
    written, MiB/s:               0.68

General statistics:
    total time:                          300.0973s
    total number of events:              58871

Latency (ms):
         min:                                    0.00
         avg:                                   10.19
         max:                                 1642.47
         95th percentile:                       34.33
         sum:                               600072.54

Threads fairness:
    events (avg/stddev):           29435.5000/46.50
    execution time (avg/stddev):   300.0363/0.00

total number of events: 58871
avg: 10.19
95th percentile: 34.33

对比 2T6-2，总请求数显著增加，平均响应时间显著减少，95%响应时间显著减少。性能显著提升。

2.3 小结

综上，笔者测试的最优测试结果为 3T2：--file-block-size=16K --file-fsync-freq=1 --file-fsync-mode=fdatasync 。

3、内存

笔者的测试虚拟机的内存大小为 3.7G ，swap 区大小为 2G 。

3.1 顺序写入

$ sysbench memory --memory-block-size=16K --memory-total-size=2G run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16KiB
  total size: 2048MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 131072 (581340.94 per second)

2048.00 MiB transferred (9083.45 MiB/sec)


General statistics:
    total time:                          0.2231s
    total number of events:              131072

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.17
         95th percentile:                        0.00
         sum:                                  197.38

Threads fairness:
    events (avg/stddev):           131072.0000/0.00
    execution time (avg/stddev):   0.1974/0.00

3.2 顺序读取

$ sysbench memory --memory-block-size=16K --memory-total-size=2G --memory-oper=read run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16KiB
  total size: 2048MiB
  operation: read
  scope: global

Initializing worker threads...

Threads started!

Total operations: 131072 (1361140.38 per second)

2048.00 MiB transferred (21267.82 MiB/sec)


General statistics:
    total time:                          0.0940s
    total number of events:              131072

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.18
         95th percentile:                        0.00
         sum:                                   70.39

Threads fairness:
    events (avg/stddev):           131072.0000/0.00
    execution time (avg/stddev):   0.0704/0.00

3.3 随机写入

$ sysbench memory --memory-block-size=16K --memory-total-size=2G --memory-access-mode=rnd run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16KiB
  total size: 2048MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 131072 (94320.51 per second)

2048.00 MiB transferred (1473.76 MiB/sec)


General statistics:
    total time:                          1.3873s
    total number of events:              131072

Latency (ms):
         min:                                    0.01
         avg:                                    0.01
         max:                                    0.22
         95th percentile:                        0.01
         sum:                                 1359.07

Threads fairness:
    events (avg/stddev):           131072.0000/0.00
    execution time (avg/stddev):   1.3591/0.00

3.4 随机读取

$ sysbench memory --memory-block-size=16K --memory-total-size=2G --memory-access-mode=rnd --memory-oper=write run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 16KiB
  total size: 2048MiB
  operation: write
  scope: global

Initializing worker threads...

Threads started!

Total operations: 131072 (94733.62 per second)

2048.00 MiB transferred (1480.21 MiB/sec)


General statistics:
    total time:                          1.3813s
    total number of events:              131072

Latency (ms):
         min:                                    0.01
         avg:                                    0.01
         max:                                    0.25
         95th percentile:                        0.01
         sum:                                 1353.19

Threads fairness:
    events (avg/stddev):           131072.0000/0.00
    execution time (avg/stddev):   1.3532/0.00

4、线程

两线程：

$ sysbench threads --threads=2 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Initializing worker threads...

Threads started!


General statistics:
    total time:                          10.0007s
    total number of events:              24590

Latency (ms):
         min:                                    0.70
         avg:                                    0.81
         max:                                    4.88
         95th percentile:                        1.18
         sum:                                19977.52

Threads fairness:
    events (avg/stddev):           12295.0000/22.00
    execution time (avg/stddev):   9.9888/0.00

四线程：

$ sysbench threads --threads=4 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Initializing worker threads...

Threads started!


General statistics:
    total time:                          10.1937s
    total number of events:              17597

Latency (ms):
         min:                                    0.82
         avg:                                    2.29
         max:                                  193.69
         95th percentile:                        3.07
         sum:                                40367.02

Threads fairness:
    events (avg/stddev):           4399.2500/15.93
    execution time (avg/stddev):   10.0918/0.10

八线程：

$ sysbench threads --threads=8 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 8
Initializing random number generator from current time


Initializing worker threads...

Threads started!


General statistics:
    total time:                          10.0059s
    total number of events:              11374

Latency (ms):
         min:                                    0.74
         avg:                                    7.03
         max:                                  223.72
         95th percentile:                       23.10
         sum:                                79953.72

Threads fairness:
    events (avg/stddev):           1421.7500/35.77
    execution time (avg/stddev):   9.9942/0.01

因为测试环境只有两个 CPU，因而两线程时性能是最佳的。

5、POSIX 互斥

$ sysbench mutex --threads=2 run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 2
Initializing random number generator from current time


Initializing worker threads...

Threads started!


General statistics:
    total time:                          0.2546s
    total number of events:              2

Latency (ms):
         min:                                  254.01
         avg:                                  254.22
         max:                                  254.43
         95th percentile:                      253.35
         sum:                                  508.44

Threads fairness:
    events (avg/stddev):           1.0000/0.00
    execution time (avg/stddev):   0.2542/0.00