Nagios详细安装配置文档（2）-优快云博客

使用NRPE监控LINUX

Nagios详细安装配置文档（2） - herb - herb

NRPE总共由两部分组成:

– check_nrpe 插件,位于在监控主机上

– NRPE daemon,运行在远程的linux主机上(通常就是被监控机)

按照上图,整个的监控过程如下:

当nagios需要监控某个远程linux主机的服务或者资源情况时

1.nagios会运行check_nrpe这个插件,告诉它要检查什么.

2.check_nrpe插件会连接到远程的NRPE daemon,所用的方式是SSL

3.NRPE daemon会运行相应的nagios插件来执行检查

4.NRPE daemon将检查的结果返回给check_nrpe插件,插件将其递交给nagios做处理.

注意:NRPE daemon需要nagios插件安装在远程的linux主机上,否则,daemon不能做任何的监控.

安装nrpe插件

tar ‐zxvf nrpe‐***.tar.gz

cd nrpe‐2.8.1

./configure

make all

make install‐plugin

make install‐daemon

make install‐daemon‐config

/usr/local/nagios/libexec/check_nrpe ‐H localhost

会返回当前NRPE的版本

# /usr/local/nagios/libexec/check_nrpe ‐H localhost

NRPE v2.8.1

也就是在本地用check_nrpe连接nrpe daemon是正常的

注:为了后面工作的顺利进行,注意本地防火墙要打开5666能让外部的监控机访问

/usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法

可以看到用法是check_nrpe –H 被监控的主机 ‐c要执行的监控命令

注意:‐c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令

查看NRPE的监控命令

cd /usr/local/nagios/etc

vi nrpe.cfg

找到下面这段话

# The following examples use hardcoded command arguments...

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

红色部分是命令名,也就是check_nrpe 的-c参数可以接的内容,等号=后面是实际执行的插件程序(这与commands.cfg中定义命令的形式十分相似,只不过是写在了一行).也就是说check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.

我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.各条命令具体的含义见插件用法(执行”插件程序名 –h”)

由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令.我们可以在本机实验一下.执行

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_hda1

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs

启动NRPE

/usr/local/nagios/bin/nrpe ‐c /usr/local/nagios/etc/nrpe.cfg ‐d

客户端安装完成以后测试一下监控机使用check_nrpe与被监控机运行的nrpedaemon之间的通信：/usr/local/nagios/libexec/check_nrpe -H 192.168.0.100

NRPE v2.8.1

看到已经正确返回了NRPE的版本信息,说明一切正常

客户端（被监控端）配置：

增加用户

useradd nagios

设置密码

passwd nagios

2安装nagios插件

tar ‐zxvf nagios‐plugins‐***.tar.gz

cd nagios‐plugins‐***

./configure

make

make install

chown nagios.nagios /usr/local/nagios

chown ‐R nagios.nagios /usr/local/nagios/libexec

安装nrpe（监控端和被监控端都安装）

同监控端安装一直（如上）

tar ‐zxvf nrpe‐***.tar.gz

cd nrpe‐2.8.1

./configure

make all

make install‐plugin

make install‐daemon

make install‐daemon‐config

make install-xinetd

输出如下

/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe

可以看到创建了这个文件/etc/xinetd.d/nrpe

编辑这个脚本

vi /etc/xinetd.d/nrpe

# default: on

# description: NRPE (Nagios Remote Plugin Executor)

service nrpe

{

flags = REUSE

socket_type = stream

port = 5666

wait = no

user = nagios

group = nagios

server = /usr/local/nagios/bin/nrpe

server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd

log_on_failure += USERID

disable = no

only_from = 127.0.0.1在后面增加监控主机的地址0.111,以空格间隔

}

改后

only_from = 127.0.0.1 192.168.0.111

编辑/etc/services文件,增加NRPE服务

vi /etc/services

增加如下

# Local services

nrpe 5666/tcp # nrpe

重启xinetd服务

[root@dbpi nrpe-2.8.1]# service xinetd restart

Stopping xinetd: [ OK ]

Starting xinetd: [ OK ]

查看NRPE是否已经启动

[root@dbpi nrpe-2.8.1]# netstat -at|grep nrpe

tcp 0 0 *:nrpe *:* LISTEN

[root@dbpi nrpe-2.8.1]# netstat -an|grep 5666

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN

可以看到5666端口已经在监听了

设置ngios配置文件（难点&重要）

Nagios主配置文件在/usr/local/nagios/etc/nagios.cfg

nagios.cfg为主配置文件,增加新的.cfg配置文件，需要在nagios.cfg里面添加才可以

创建下面几个cfg配置文件在/usr/local/nagios/etc里面
touch contactgroups.cfg contacts.cfg hostgroups.cfg hosts.cfg services.cfg timeperiods.cfg

在对上面相关几个文件添加内容后，还是需要在nagios.cfg中，添加对应文件。如：

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/hosts.cfg

cfg_file=/usr/local/nagios/etc/hostgroups.cfg

cfg_file=/usr/local/nagios/etc/services.cfg

cfg_file=/usr/local/nagios/etc/contacts.cfg

cfg_file=/usr/local/nagios/etc/contactgroups.cfg

添加要监控的主机到配置文件中

---开始

有两种方法。一种就是将这台机器要监控的内容分别写入hosts.cfg

contacts.cfg service.cfg 文件中,然后要在nagios.cfg添加好对应的cfg文件路径。

另一种方法，就是生成一个单独的ip.cfg文件，之后把要监控的内容写入这一个文件就可以了。

例子：1，添加监控本机的web服务。

vi nagios/etc/localhost.cfg
添加
define service{
        use                             local-service
        host_name                       localhost
        service_description             Current http
        check_command                   check_http!100.0,20%!500.0,60%
        }

这配置就是参考上面的PING监控改的，如果要添加其它服务同理。

之后重新启动nagios ，打开web页面5分钟后，就可以看本机apache的状态了。

2.添加192.168.0.128这台远程机器

vi 192.168.0.128.cfg 创建新文件

define host{
use generic-host ; Name of host template to use
host_name test_nrpe
alias client
address 192.168.0.128
check_command check-host-alive
max_check_attempts 1
check_period 24x7
notification_interval 120
notification_period 24x7
notification_options d,r
contact_groups ghbadmin
}

define service{
use generic-service ; Name of service template to use
host_name test_nrpe
service_description apache
is_volatile 0                     #类似声音警告功能关闭
check_period 24x7                 #监控期限为24X7
max_check_attempts 1              #最大重试次数
normal_check_interval 1           #标准检测时间间隔 1分钟
retry_check_interval 1            #重试时间间隔
contact_groups admins             #联系组
notification_options w,u,c,r      # w,u,c,r 发生这四种情况时，进行通告。
notification_interval 960         # 通告间隔
notification_period 24x7          #通告过期时间
check_command check_http!100.0,20%!500.0,60%
}

文件里面第一部为监控主机是否存活,第二部分为监控apache服务，如果死掉了，则通过 ghbadmin 这个组来发送邮件,而且这个ghbadmin邮件联系组是在哪里定义的呢，我是在localhost.cgf 文件定义的，这样就不用在其它配置文件再次写入了。

vi localhost.cfg

define contactgroup{
        contactgroup_name       ghbadmin
        alias                   Nagios Administrators
        members                 ghbspecial
        }
define contact{
        contact_name                    ghbspecial
        alias                           Nagios Admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,r
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           ghb@aaa.com
        }

上面配置完毕之后，记得要修一下nagios.cf文件，把192.168.0.128.cfg 添加进去

cfg_file=/usr/local/nagios/etc/192.168.0.128.cfg

--结束

此段开始—结束是从网上找的，便于加强理解，比较正规的做法应

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

cfg_file=/usr/local/nagios/etc/hosts.cfg

cfg_file=/usr/local/nagios/etc/hostgroups.cfg

cfg_file=/usr/local/nagios/etc/services.cfg

cfg_file=/usr/local/nagios/etc/contacts.cfg

cfg_file=/usr/local/nagios/etc/contactgroups.cfg

commands.cfg 定义监控命令文件

contacts 定义监控联系人

timeperiods.cfg 时间周期默认初始化设定文件

templates.cfg 临时文件

主要设置为hosts.cfg 定义监控的主机

hostgroups.cfg 定义监控的主机组

services.cfg 定义监控的服务

contacts.cfg 定义监控的联系对象

contactgroups.cfg 定义监控联系对象组

hosts.cfg 定义监控主机

注：services.cfg 中定义监控命令和监控周期一定要于其配置文件对应，可用自带的参数也可自行修改！

例如

define service{

host_name web

service_description CPU

check_command check_snmp_cpu

#/usr/local/nagios/etc/objects/commands.cfg中定义有check_snmp_cpu

contact_groups chb

check_period service

# /usr/local/nagios/etc/objects/timeperiods.cfg中定义service

max_check_attempts 4

normal_check_interval 10

retry_check_interval 60

notifications_enabled 1

notification_options u,c,r

}

#/usr/local/nagios/etc/objects/commands.cfg 定义监控命令

define command{

command_name check_snmp_cpu

command_line $USER1$/check_snmp -H $HOSTADDRESS$ -P 2c -o .1.3.6.1.4.1.2021.11.9.0,.1.3.6.1.4.1.2021.11.10.0,.1.3.6.1.4.1.2021.11.11.0 -l 'CPU usage (user system idle)' -u '%'

}

# /usr/local/nagios/etc/objects/timeperiods.cfg 定义监控周期

define timeperiod{

timeperiod_name service

alias 12 Hours A Day, 7 Days A Week

sunday 06:00-24:00

monday 06:00-24:00

tuesday 06:00-24:00

wednesday 06:00-24:00

thursday 06:00-24:00

friday 06:00-24:00

saturday 06:00-24:00

}

总结：在hosts.cfg定义被监控主机列表，在hostgroups.cfg中对被监控机进行分组，然后在services.cfg中定义相关的监控服务，定义监控的主机（hostgroup_name为直接监控的组成员，host_name为直接监控一台主机，然后同理定义监控联系对象（可在contact_groups划分组，如services.cfg定义为contact_groups则对组成员生效，contacts_name则只对当前联系人生效）

13常见问题汇总

13.1.防火墙配置

iptables -A FORWARD -i eth0 -p tcp –dport 5666 -j ACCEPT

13.2.cannot find ssl libraries

CHECK_NRPE: Error – Could not complete SSL handshake 错误解决

13.3 Not have perimission

It appears as though you do not have permission to view information for any of the hosts you requested...
If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI
and check the authorization options in your CGI configuration file.
1、修改/etc/cgi.cfg

13.4 Connection refused by host

[root@localhost nrpe-2.8.1]# /usr/local/nagios/libexec/check_nrpe -H localhost

Connection refused by host

vi /usr/local/nagios/etc/nrpe.cfg

allowed_hosts=127.0.0.1, 添加监控主机地址以“，”间隔

vi /etc/xinetd.d/nrpe

only_from = 127.0.0.1增加监控主机的地址，以空格间隔

参考资料：

http://www.360doc.com/content/11/0112/17/5137617_86022130.shtml
http://blogold.chinaunix.net/u/11765/showart_334905.html
http://www.51chongdian.net/bbs/thread-26319-1-1.html
http://blogold.chinaunix.net/u/1028/showart_1410014.html
http://blog.zhanxb.com/post/141/
http://www.51chongdian.net/bbs/thread-26319-1-1.html
http://blog.zhanxb.com/post/141/
http://blogold.chinaunix.net/u/19540/showart_197158.html
http://nagios-cn.sourceforge.net/nagios-cn/Nagios-cn.html#beginners --中文在线帮助文档
http://ithero.javaeye.com/blog/315231