You may see "cpu soft lockup" messages in the log files on 96 core systems under heavy load. These are informational messages indicating that a CPU did not respond to a
softlockup
timer within the timer window (currently 10 seconds on Red Hat Enterprise Linux). They do not indicate a problem with the system.
The system continues to function correctly, although with some performance anomalies related to a CPU spinning on a lock for 10 seconds or more.
The current upstream setting for this softlockup
timer parameter is 60 seconds. Testing has shown that we do not see these messages if the parameter is set to 30 seconds or more.
We suggest altering the default value of kernel.softlockup_thresh
from 10 to 30, by doing one of the following:
-
Recommended, affects current and next reboot value:
sysctl -w kernel.softlockup_thresh=30
-
Add this line to
/etc/sysctl.conf
(takes effect on next reboot):kernel.softlockup_thresh=30
-
Change value dynamically; only affects the system's current value:
echo 30 > /proc/sys/kernel/softlockup_thresh