nagiso的日常管理及调试方法-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_40283570/article/details/106570218

查看服务状态

#查看主程序状态
[root@nagios-server services]# service nagios status
nagios (pid 29350) is running...
#查看NRPE端口
[root@nagios-server services]# netstat -tlnp |grep 5666
tcp        0      0 :::5666                     :::*                        LISTEN      26747/xinetd
#查看NRPE服务状态
[root@nagios-server services]# service xinetd status
xinetd (pid  26747) is running...

重启主服务

修改配置文件

完成修改配置文件后要执行 nagios -v 命令检查主配置文件是否有错误。没有错误则重启 sendmail、nagios、http 服务使监控对象配置文件生效。

[root@nagios-server services]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
        Checked 228 services.
        Checked 61 hosts.
        Checked 15 host groups.
        Checked 0 service groups.
        Checked 3 contacts.
        Checked 3 contact groups.
        Checked 29 commands.
        Checked 2 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 61 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 2 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0


Things look okay - No serious problems were detected during the pre-flight check

手动重启NRPE

先关闭xinetd服务，然后手动重启NRPE

service xinetd stop
有时候会起不来
netstat -lantp | grep 5666
查出进程号杀掉进程
kill -9 进程号

手动启动nrpe：/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

测试NRPE

/usr/local/nagios/libexec/check_nrpe

这个检测NRPE版本的命令，可以用来客户端检测本机的NRPE是否正常，也可以在服务端上面检测客户端的NRPE是否正常，但无法用来检测服务端本身NRPE是否正常。
[root@nagios-server hosts]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.

本机调试方法

这里主要是指linux 客户端测试本机命令
[root@app1 ~]# cat /usr/local/nagios/etc/nrpe.cfg

内容片段如下，这里都是执行的命令，本地执行看看效果

# 系统启动运行时长
command[check_uptime]=/usr/local/nagios/libexec/check_uptime -v

# 监控CPU负载情况
command[check_load]=/usr/local/nagios/libexec/check_load -w 5.0,4.0,3.0 -c 10.0,6.0,4.0

# 监控交换分区利用率
# 剩余空间少于20%则为WARNING状态，少于10%则为CRITICAL状态。
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

# 监控当前登陆用户数
# 有用户登录则为WARNING状态，超过1个用户登录则为CRITICAL状态。
command[check_users]=/usr/local/nagios/libexec/check_users -w 3 -c 5

# 监控/分区使用率
# 剩余空间少于20%则为WARNING状态，少于10%则为CRITICAL状态
command[check_disk_sda3]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3

# 监控/data分区使用率
# 剩余空间少于20%则为WARNING状态，少于10%则为CRITICAL状态
command[check_disk_sdb1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sdb1

# 监听所有进程的CPU使用率
# 任何进程CPU使用率超过80%则为WARNING状态，超过90%则为CRITICAL状态。
command[check_procs_cpu]=/usr/local/nagios/libexec/check_procs -w 80 -c 90 --metric=CPU
#check memary
command[check_used_mem]=/usr/local/nagios/libexec/check_used_mem.sh 80 90

以check_uptime为例：

#通过check_nrpe 执行命令
[root@app1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_uptime
Uptime OK: 213 day(s) 2 hour(s) 59 minute(s) | uptime=306899.000000;;;
#直接执行命令
[root@app1 ~]# /usr/local/nagios/libexec/check_uptime -v
Uptime OK: 213 day(s) 2 hour(s) 59 minute(s) | uptime=306899.000000;;;


#测试NRPE
[root@app1 ~]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.15

远程调试命令

#测试客户端的NRPE是否联通
[root@nagios-server services]# /usr/local/nagios/libexec/check_nrpe -H 192.168.12.50
NRPE v2.15
#通过NRPE检查远程客户端的运行时间
[root@nagios-server services]# /usr/local/nagios/libexec/check_nrpe -H 192.168.12.50 -c check_uptime
Uptime OK: 213 day(s) 3 hour(s) 5 minute(s) | uptime=306905.000000;;;
#本地NRPE无法用这个命令检查
[root@nagios-server services]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
CHECK_NRPE: Error - Could not complete SSL handshake.
#通过NRPE检查远程客户端的http服务
[root@nagios-server services]# /usr/local/nagios/libexec/check_http -I 192.168.12.6
HTTP OK: HTTP/1.1 200 OK - 54196 bytes in 0.042 second response time |time=0.041735s;;;0.000000 size=54196B