linux系统sssd进程一直处于启动状态并占用大量内存
简单记录一下解决过程
通过 free -h 命令发现服务器内存使用量很高:
[root@dt-us-pre-sftp-02 ~]# free -h
total used free shared buff/cache available
Mem: 6.6Gi 4.8Gi 974Mi 326Mi 910Mi 744Mi
Swap: 0B 0B 0B
top 看了一下,发现CPU内存使用量最多的是 sssd 进程;
查看 sssd 进程的状态如下:
[root@dt-us-pre-sftp-02 tools]# systemctl status sssd
● sssd.service - System Security Services Daemon
Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
Active: activating (start) since Tue 2025-05-06 09:02:08 CST; 56s ago
Main PID: 594235 (sssd)
Tasks: 4
Memory: 4.2G
CGroup: /system.slice/sssd.service
├─594235 /usr/sbin/sssd -i --logger=files
├─594236 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
├─594237 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
└─594238 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
5月 06 09:02:08 dt-us-pre-sftp-02 systemd[1]: Starting System Security Services Daemon...
5月 06 09:02:08 dt-us-pre-sftp-02 sssd[594235]: Starting up
5月 06 09:02:08 dt-us-pre-sftp-02 sssd[be[implicit_files]][594236]: Starting up
5月 06 09:02:08 dt-us-pre-sftp-02 sssd[nss][594237]: Starting up
5月 06 09:02:08 dt-us-pre-sftp-02 sssd[pam][594238]: Starting up
查看日志:
[root@dt-us-pre-sftp-02 tools]# journalctl -u sssd
-- Logs begin at Wed 2025-04-30 08:47:05 CST, end at Tue 2025-05-06 09:03:38 CST. --
4月 30 08:47:09 dt-us-pre-sftp-02 systemd[1]: Starting System Security Services Daemon...
4月 30 08:47:10 dt-us-pre-sftp-02 sssd[822]: Starting up
4月 30 08:47:10 dt-us-pre-sftp-02 sssd[be[implicit_files]][900]: Starting up
4月 30 08:47:10 dt-us-pre-sftp-02 sssd[nss][935]: Starting up
4月 30 08:47:10 dt-us-pre-sftp-02 sssd[pam][936]: Starting up
4月 30 08:48:39 dt-us-pre-sftp-02 systemd[1]: sssd.service: start operation timed out. Terminating.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: State 'stop-sigterm' timed out. Killing.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Killing process 822 (sssd) with signal SIGKILL.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Killing process 900 (sssd_be) with signal SIGKILL.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Killing process 935 (sssd_nss) with signal SIGKILL.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Killing process 936 (sssd_pam) with signal SIGKILL.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Main process exited, code=killed, status=9/KILL
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Failed with result 'timeout'.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: Failed to start System Security Services Daemon.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Service RestartSec=100ms expired, scheduling restart.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: sssd.service: Scheduled restart job, restart counter is at 1.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: Stopped System Security Services Daemon.
4月 30 08:50:10 dt-us-pre-sftp-02 systemd[1]: Starting System Security Services Daemon...
4月 30 08:50:10 dt-us-pre-sftp-02 sssd[1704]: Starting up
4月 30 08:50:10 dt-us-pre-sftp-02 sssd[be[implicit_files]][1713]: Starting up
4月 30 08:50:10 dt-us-pre-sftp-02 sssd[nss][1715]: Starting up
4月 30 08:50:10 dt-us-pre-sftp-02 sssd[pam][1716]: Starting up
4月 30 08:51:40 dt-us-pre-sftp-02 systemd[1]: sssd.service: start operation timed out. Terminating.
通过 systemctl restart sssd 命令重启发现并不能解决问题 还是这样一直处于 activating (start) 状态,在网上搜资料也没找到合适的解决办法,当然,中间也尝试了一些手段,比如:
- 查看 /var/log/sssd/ 目录下面的日志,根据报错去查找原因;
- 找一台正常运行 sssd 进程的服务器,去对比配置;
- 清除 sssd 缓存:rm -rf /var/lib/sss/db/* ,然后重启 sssd进程:systemctl restart sssd;
这些方法都没能成功。
解决:
最后通过重装 sssd 服务的方式解决了:
[root@dt-us-pre-sftp-02 db]# yum update sssd
[root@dt-us-pre-sftp-02 db]# yum reinstall sssd
[root@dt-us-pre-sftp-02 db]# systemctl status sssd
● sssd.service - System Security Services Daemon
Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2025-05-06 09:36:25 CST; 7s ago
Main PID: 596724 (sssd)
Tasks: 4
Memory: 64.8M
CGroup: /system.slice/sssd.service
├─596724 /usr/sbin/sssd -i --logger=files
├─596725 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
├─596726 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
└─596727 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
5月 06 09:36:25 dt-us-pre-sftp-02 systemd[1]: Starting System Security Services Daemon...
5月 06 09:36:25 dt-us-pre-sftp-02 sssd[596724]: Starting up
5月 06 09:36:25 dt-us-pre-sftp-02 sssd[be[implicit_files]][596725]: Starting up
5月 06 09:36:25 dt-us-pre-sftp-02 sssd[nss][596726]: Starting up
5月 06 09:36:25 dt-us-pre-sftp-02 sssd[pam][596727]: Starting up
5月 06 09:36:25 dt-us-pre-sftp-02 systemd[1]: Started System Security Services Daemon.
5月 06 09:36:26 dt-us-pre-sftp-02 systemd[1]: /usr/lib/systemd/system/sssd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly.
[root@dt-us-pre-sftp-02 db]#
[root@dt-us-pre-sftp-02 db]#
[root@dt-us-pre-sftp-02 db]# free -h
total used free shared buff/cache available
Mem: 6.6Gi 622Mi 4.9Gi 326Mi 1.1Gi 4.8Gi
Swap: 0B 0B 0B
这里根本原因不深究了,解决就行,平时也几乎没接触过这个 sssd 进程。
补充:
后来又遇到类似的问题,查看日志:
[root@dt-us-pre-sftp-02 ~]# tail -20f /var/log/sssd/sssd_implicit_files.log
(Tue May 27 14:05:58 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): transaction_write: failed at off=4294904444 len=4
(Tue May 27 14:05:58 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): merge_with_left_record: update_tailer failed at 47784
(Tue May 27 14:06:00 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): transaction_write: failed at off=4294904444 len=4
(Tue May 27 14:06:00 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): merge_with_left_record: update_tailer failed at 47784
(Tue May 27 14:06:02 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): transaction_write: failed at off=4294904444 len=4
(Tue May 27 14:06:02 2025) [sssd[be[implicit_files]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/timestamps_implicit_files.ldb): merge_with_left_record: update_tailer failed at 47784
看起来是sssd服务在启动过程中遇到了数据库损坏问题:tdb 是 SSSD 使用的轻量级数据库引擎,错误表明数据库文件损坏或存在不一致。
解决方案(其实跟上边类似):
- 停止 sssd 服务:systemctl stop sssd
- 备份(略)并删除损坏的数据库文件(清除 sssd 缓存):rm -rf /var/lib/sss/db/*
- 重新安装 sssd 服务:yum update sssd && yum reinstall sssd
- 重启 sssd 服务:systemctl start sssd