How do I tune RHEL for better TCP performance over a specific network connection?

本文围绕如何优化RHEL在特定网络连接上的TCP性能展开。介绍了在Red Hat Enterprise Linux环境下,传输大量数据时遇到的问题,如TCP窗口小影响传输速度。给出了通过计算带宽延迟积来调整套接字缓冲区、TCP窗口等参数的方法,还提及了测试注意事项。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

https://access.redhat.com/solutions/168483

How do I tune RHEL for better TCP performance over a specific network connection?

 SOLUTION 已验证 - 已更新 2017年四月3日22:06 - 

English 

环境

  • Red Hat Enterprise Linux
  • Network connection known performance characteristics, such as a WAN connection with high throughput and high latency, but does not necessarily have to be a WAN
  • TCP networking

问题

  • We are transmitting a large amount of data over a WAN link and wish to tune the transfer to be as fast as possible
  • We are trying to backhaul traffic from one datacenter to another and appear to be running into small TCP window sizes that are impacting the transmission speed
  • We are looking for sysctl parameters which can be tuned to allow better performance over our network
  • How do I accurately tune TCP socket buffer sizes?

决议

What you are looking to calculate is called the Bandwidth Delay Product. This is the product of a link's capacity and latency, ie. how many bits can actually be on the wire at a given time.

Socket Buffers

Once this is calculated, you should then tune your network buffers to accommodate this amount of traffic, plus some extra. Tune write buffers on the sender (net.core.wmem_maxnet.core.wmem_defaultnet.ipv4.tcp_wmemnet.ipv4.tcp_mem) and read buffers on the receiver (net.core.rmem_maxnet.core.rmem_defaultnet.ipv4.tcp_rmemnet.ipv4.tcp_mem).

We make these buffer changes by entering lines into /etc/sysctl.conf file and running:

Raw

[root@host]# sysctl -p

You may also choose to make the change temporarily by running:

Raw

[root@host]# sysctl -w key.name=value

For example:

Raw

[root@host]# sysctl -w net.core.rmem_max=16777216

To prepare for this, we will increase the maximum amount of memory available for network sockets:

Raw

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

net.ipv4.tcp_mem = 8388608 12582912 16777216

We will also need to make sure that TCP Window Scaling is on. Run the command:

Raw

[root@host]# sysctl -a | grep window_scaling

and ensure the returned result is:

Raw

[root@host]# sysctl -a | grep window_scaling
net.ipv4.tcp_window_scaling = 1

If window scaling is set to 0, change it to 1.

Your action here will be to tune socket buffers. After you make each change, test the speed to a client and record the result.

The tunables here are:

Raw

net.ipv4.tcp_rmem = 8192 x 4194304
net.ipv4.tcp_wmem = 8192 Y 4194304

The first value is the smallest buffer size. We recommend keeping this at 8192 bytes so that it can hold two memory pages of data.

The last value is the largest buffer size. You could probably safely leave this at 4194304 (4Mb)

The middle value Y is the default buffer size. This is the most important value. You might wish to start at 524288 (512kb) and move up from there. You will generally wish to try small increments of your Bandwidth Delay Product. Try BDP x1 then BDP x1.25 then BDP x1.5 and so on. Once you start to get increased speeds, you may wish to refine your testing down smaller, for example BDP x2.5 then BDP x2.6 and so on. It is unlikely you will need a value larger than BDP x5.

Unfortunately there's no "silver bullet" buffer value which is perfect, or can be calculated definitely. Each individual link will be different and requires individual testing to attain the best throughput.

We expect you will come up with a table of results something like:

Raw

---------------------------------------------------
Buffer |  BDP*1.5  | BDP*1.75 |   BDP*2   | ... and so on
Size   | (1312500) | (1531250 | (1750000) | ...
--------------------------------------------------
Client |    A kbps |   B kbps |    C kbps | ...
Speed  |           |          |           | ...
--------------------------------------------------

This will help you to see the best buffer size for the best client speed. You would then set this buffer permanently.

TCP Window

If you run a packet capture at the same time, you'll see TCP window size grow. Window size will never hit the maximum value of your buffers, as TCP consumes some bandwidth as overhead to automatically tune the window.

TCP does not use the full window size instantly when a new connection is established. Rather, TCP uses a "slow start" algorithm which gradually increases amount of data sent as the connection life progresses. The purpose of this gradual increase is to allow TCP to calculate how much data can be sent over the network without dropping packets.

By default, TCP will reset this calculation (called the congestion window) back to its initial value after a period of idle time equal to the Round Trip Timeout between the two hosts (i.e. double the network latency). This can significantly reduce the transfer data rate of a connection if there are idle periods of application processing.

If the application performs a "data transmit" period, then stops data transmit for a "processing" period and the processing period is greater than double the WAN latency, there may be some advantage to disabling the slow start algorithm for established connection. This is done by changing the net.ipv4.tcp_slow_start_after_idle tunable:

Raw

net.ipv4.tcp_slow_start_after_idle = 0

Testing

If you're using file copies as tests, ensure you drop caches between tests (echo 3 > /proc/sys/vm/drop_caches). We would suggest using direct I/O or some other method to bypass caches (such as dd conv=fsync) so that cached data does not produce artificial test results.

You are always better to test with your actual production workload, or a simulation of the production workload crafted with a tool such as iozone. Tuning a system for artificial bulk transfer benchmarks when your application sends small amounts of data and requires low latency, or vice versa, will only result in incorrect tuning and will hurt overall application performance.

References

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值