Nagios之配置介绍

 再配置本文之前请先阅读http://blog.youkuaiyun.com/u010257584/article/details/56278009点击打开链接,关于Nagios的其他内容请关注作者陆续更新!


1.Nagios配置文件简介


  先说一下private”和“public”服务、应用及协议,这里的“private”是指主机的CPU loadMemory usageDisk usageLogged in usersRunning processes等服务,“public”服务是指可以通过本地网络或者互联网连接获得的服务,比如:HTTP,POP3,IMAP,FTP以及SSH,其实在日常使用中还有更多的基础服务,这些服务与应用,包括所依托的协议,可以被Nagios直接监控而不需要额外的插件来支持。相反,“private”服务如果没有某些中间件做代理Nagios是无法监控的,关于远程Linux/UNIX主机的“private”服务监控,可以参考后续博文介绍的NRPE的内容。

      本节就常见的一些服务的监控配置做一些简单的说明。

1.Nagios目录结构



对应各目录的内容如下:

目录名称

目录内容

bin

Nagios可执行程序所在目录

etc

Nagios配置文件目录

sbin

Nagios cgi文件所在目录,也就是执行外部命令所需要文件所在的目录

share

Nagios网页存放路径

libexec

Nagios外部插件存放目录

var

Nagios日志文件、Lock等文件所在的目录

var/archives

Nagios日志自动归档目录

var/rw

用来存放外部命令文件的目录

2.Nagios配置文件关系

        Nagios的配置文件包括:主配置文件、资源文件、对象定义文件和CGI配置文件。主配置文件包含影响Nagios Core守护程序操作方式的许多指令,此配置文件由Nagios Core守护程序和CGI读取;资源文件用于存储用户定义的宏,主要是用来存储敏感的配置信息(如密码),而不使它们可用于CGI;对象定义文件用于定义主机,服务,主机组,联系人,联系人组,命令等;CGI配置文件中包含了一些影响操作指令的CGI,它还包含一个引用主配置文件,知道Nagios的配置内容和对象定义存储的位置,它们的关系如下:


3.Nagios配置文件简介

      Nagios安装成功后,会在/usr/local/nagios/etc目录下生成相应的主机,服务、命令、模板等配置文件,同时也可看到之前设置的Nagios授权目录认证文件htpasswed.users,而Object目录是存放一些配置文件模板,主要用于定义Nagios对象

        Nagios配置目录与文件:


   Nagios对象模板文件如下:

 

  对应的配置文件简介如下:

配置文件

说明

cgi.cfg

控制CGI访问的配置文件

nagios.cfg

Nagios主配置文件

resource.cfg

变量定义文件,又称为资源文件,在此文件中定义变量,以便由其他配置文件引用,如$USER1$

objects

objects是一个目录,在此目录下有很多配置文件模板,用于定义Nagios对象

objects/commands.cfg

命令定义配置文件,其中定义的命令可以被其他配置文件引用

objects/contacts.cfg

定义联系人和联系人组的配置文件

objects/localhost.cfg

定义监控本地主机的配置文件

objects/printer.cfg

定义监控打印机的一个配置文件模板,默认没有启用此文件

objects/switch.cfg

监控路由器的一个配置文件模板,默认没有启用此文件

objects/templates.cfg

定义主机和服务的一个模板配置文件,可以在其他配置文件中引用

objects/timeperiods.cfg

定义Nagios监控时间段的配置文件

objects/windows.cfg

监控Windows主机的一个配置文件模板,默认没有启用此文件

备注:

Nagios在配置方面非常灵活,默认的配置文件并不是必需的。可以使用这些默认的配置文件,

也可以创建自己的配置文件,然后在主配置文件nagios.cfg中引用即可。

2.Nagios配置模板简介

   Nagios的配置过程可从五个步骤来入手,参见接下来的5节内容:

2.1.定义主机或服务出现问题时要通知的联系人和联系人组

   该步骤涉及的配置文件为: /usr/local/nagios/etc/objects/contacts.cfg,关于contact、contactgroup的一些基础配置可以参见/usr/local/nagios/etc/objects/timeperiods.cfg中的定义。

   1.contact用于识别在网络中出现问题时应联系的人。

   定义格式:

define contact{       
		contact_name       contact_name(*)      
		alias       alias(*)       
		contactgroups       contactgroup_names       
		host_notifications_enabled       [0/1](*)      
		service_notifications_enabled       [0/1](*)  
		host_notification_period       timeperiod_name(*)      
		service_notification_period       timeperiod_name(*)  
		host_notification_options       [d,u,r,f,s,n](*)       
		service_notification_options       [w,u,c,r,f,s,n](*)      
		host_notification_commands       command_name(*)     
		service_notification_commands       command_name(*)       
		email       email_address      
		pager       pager_number or pager_email_gateway       
		addressx       additional_contact_address 
		can_submit_commands       [0/1]      
		retain_status_information       [0/1]  
		retain_nonstatus_information       [0/1]
		...  
		}

定义样例:

define contact{
		contact_name                    jdoe
		alias                           John Doe
		host_notifications_enabled		1
		service_notifications_enabled	1
		service_notification_period     24x7
		host_notification_period        24x7
		service_notification_options    w,u,c,r
		host_notification_options       d,u,r
		service_notification_commands   notify-by-email
		host_notification_commands      host-notify-by-email
		email			jdoe@localhost.localdomain
		pager			555-5555@pagergateway.localhost.localdomain
		address1			xxxxx.xyyy@icq.com
		address2			555-555-5555
		can_submit_commands	1
		}

   简单说明下Host_notification_options、service_notification_options:

1)Host_notification_options:

  • d = notify on DOWN host states,
  • u = notify on UNREACHABLE host states
  • r = notify on host recoveries (UP states)
  • f = notify when the host starts and stops flapping
  • s = send notifications when host or service scheduled downtime starts and ends
  • (none) as an option, the contact will not receive any type of host notifications.

2)service_notification_options:

  • w = notify on WARNING service states
  • u = notify on UNKNOWN service states
  • c = notify on CRITICAL service states
  • r = notify on service recoveries (OK states)
  • f = notify when the service starts and stops flapping
  • n (none) as an option, the contact will not receive any type of service notifications.

 常用的设置

  • host_notification_options:d,u,r
  • service_notification_options:w,u,c,r

  2.contactgroup用于将一个或多个联系人分组在一起以发送警报/恢复通知,定义格式:

define contactgroup{ 
		contactgroup_name       contactgroup_name(*) 
		alias       alias(*) 
		members       contacts(*) 
		contactgroup_members       contactgroups 
		... 
		}

定义样例:

define contactgroup{
		contactgroup_name		novell-admins
		alias			Novell Administrators
		members			jdoe,rtobert,tzach
		}

2.2.定义主机、主机组、服务和服务组

  该步骤涉及的配置文件:/usr/local/nagios/etc/objects/hosts.cfg/usr/local/nagios/etc/objects/services.cfg,关于host、hostgroup、service、servicegroup的基础配置可以参见/usr/local/nagios/etc/objects/timeperiods.cfg中的定义

1.hosts.cfg配置

 1) hosts.cfg用来配置主机和主机组,格式可参考localhost.cfg中关于hosthostgroup的定义。

  主机(host)主机被定义为存在于网络中的一个物理服务器、工作站或设备等,详细格式(标记了(*)的是必备的,其他是可选的):

define host{  
		host_name       host_name(*)       
		alias       alias(*)
		display_name       display_name       
		address       address(*)       
		parents       host_names       
		hostgroups       hostgroup_names       
		check_command       command_name       
		initial_state       [o,d,u]       
		max_check_attempts       #(*)       
		check_interval       #       
		retry_interval       #       
		active_checks_enabled       [0/1]       
		passive_checks_enabled       [0/1]       
		check_period       timeperiod_name(*)       
		obsess_over_host       [0/1]       
		check_freshness       [0/1]       
		freshness_threshold       #       
		event_handler       command_name       
		event_handler_enabled       [0/1]       
		low_flap_threshold       #       
		high_flap_threshold       #       
		flap_detection_enabled       [0/1]       
		flap_detection_options       [o,d,u]       
		process_perf_data       [0/1]       
		retain_status_information       [0/1]       
		retain_nonstatus_information       [0/1]       
		contacts       contacts(*)       
		contact_groups       contact_groups(*)       
		notification_interval       #(*)       
		first_notification_delay       #       
		notification_period       timeperiod_name(*)       
		notification_options       [d,u,r,f,s]       
		notifications_enabled       [0/1]       
		stalking_options       [o,d,u]       
		notes       note_string       
		notes_url       url       
		action_url       url       
		icon_image       image_file       
		icon_image_alt       alt_string       
		vrml_image       image_file       
		statusmap_image       image_file       
		2d_coords       x_coord,y_coord       
		3d_coords       x_coord,y_coord,z_coord       
		...
		}

定义样例:

define host{
		host_name			bogus-router
		alias				Bogus Router #1
		address				192.168.1.254
		parents				server-backbone
		check_command			check-host-alive
		check_interval			5
		retry_interval			1
		max_check_attempts		5
		check_period			24x7
		process_perf_data		0
		retain_nonstatus_information	0
		contact_groups			router-admins
		notification_interval		30
		notification_period		24x7
		notification_options		d,u,r
		}
  2)主机组(hostgroup)是指一台或多台主机构成的组,可使配置更简单或是为完成特定目的而在 CGI里显示使用,格式:

define hostgroup{       
		hostgroup_name       hostgroup_name(*)       
		alias       alias(*)       
		members       hosts    
		hostgroup_members       hostgroups       
		notes       note_string       
		notes_url       url       
		action_url       url       
		...       
		}

定义样例:

define hostgroup{
		hostgroup_name		novell-servers
		alias			Novell Servers
		members		netware1,netware2,netware3,netware4
		}
3. services.cfg配置

  1)Service服务定义为在主机上运行的某种“应用服务”,定义格式:

define service{       
		host_name       host_name(*)       
		hostgroup_name       hostgroup_name     
		service_description       service_description(*)    
		display_name       display_name     
		servicegroups       servicegroup_names    
		is_volatile       [0/1]       
		check_command       command_name(*)      
		initial_state       [o,w,u,c]       
		max_check_attempts       #(*)       
		check_interval       #(*)       
		retry_interval       #(*)     
		active_checks_enabled       [0/1] 
		passive_checks_enabled       [0/1]     
		check_period       timeperiod_name(*)  
		obsess_over_service       [0/1] 
		check_freshness       [0/1]      
		freshness_threshold       #       
		event_handler       command_name     
		event_handler_enabled       [0/1]
		low_flap_threshold       #       
		high_flap_threshold       #       
		flap_detection_enabled       [0/1]       
		flap_detection_options       [o,w,c,u]       
		process_perf_data       [0/1]       
		retain_status_information       [0/1]       
		retain_nonstatus_information       [0/1]       
		notification_interval       #(*)       
		first_notification_delay       #       
		notification_period       timeperiod_name(*)      
		notification_options       [w,u,c,r,f,s]
		notifications_enabled       [0/1]       
		contacts       contacts(*)       
		contact_groups       contact_groups(*)       
		stalking_options       [o,w,u,c]       
		notes       note_string  
		notes_url       url       
		action_url       url       
		icon_image       image_file       
		icon_image_alt       alt_string       
		...       
		}
定义样例:
define service{
		host_name		linux-server
		service_description	check-disk-sda1
		check_command		check-disk!/dev/sda1
		max_check_attempts	5
		check_interval	5
		retry_interval	3
		check_period		24x7
		notification_interval	30
		notification_period	24x7
		notification_options	w,c,r
		contact_groups		linux-admins
		}
  2)Servicegroup将一个或者多个服务组织在一起,简化 service配置
define servicegroup{       
		servicegroup_name       servicegroup_name(*)      
		alias       alias(*)      
		members       services     
		servicegroup_members       servicegroups    
		notes       note_string      
		notes_url       url      
		action_url       url    
		...     
    		}

  定义样例:

define servicegroup{
		servicegroup_name	dbservices
		alias			Database Services
		members			ms1,SQL Server,ms1,SQL Server Agent,ms1,SQL DTC
		}

2.3 定义监控命令

  command定义包括服务检查,服务通知,服务事件处理程序,主机检查,主机通知和主机事件处理程序等命令,配置文件为/usr/local/nagios/etc/objects/commands.cfg。

  定义格式:

define command{ 
		command_name       command_name(*) 
		command_line       command_line(*) 
		...       
		}

  定义样例:

define command{
		command_name	check_pop
		command_line	/usr/local/nagios/libexec/check_pop -H $HOSTADDRESS$	
		}

2.4 定义监控时间周期

  timeperiod定义通知和服务检查的“有效”时间的时间列表,以周为循环时间范围,配置文件为/usr/local/nagios/etc/objects/timeperiods.cfg。

  定义格式:

define timeperiod{       
		timeperiod_name       timeperiod_name(*) 
		alias       alias(*) 
		[weekday]       timeranges 
		[exception]       timeranges 
		exclude       [timeperiod1,timeperiod2,...,timeperiodn] 
		... 
		}

 定义样例:


define timeperiod{
		timeperiod_name		nonworkhours
		alias			Non-Work Hours
		sunday			00:00-24:00			; Every Sunday of every week
		monday			00:00-09:00,17:00-24:00		; Every Monday of every week
		tuesday			00:00-09:00,17:00-24:00		; Every Tuesday of every week
		wednesday			00:00-09:00,17:00-24:00		; Every Wednesday of every week
		thursday			00:00-09:00,17:00-24:00		; Every Thursday of every week
		friday			00:00-09:00,17:00-24:00		; Every Friday of every week
		saturday			00:00-24:00			; Every Saturday of every week
		}

define timeperiod{
		timeperiod_name		misc-single-days
		alias			Misc Single Days
		1999-01-28		00:00-24:00 		; January 28th, 1999
		monday 3			00:00-24:00		; 3rd Monday of every month
		day 2			00:00-24:00		; 2nd day of every month
		february 10		00:00-24:00		; February 10th of every year
		february -1		00:00-24:00		; Last day in February of every year
		friday -2			00:00-24:00		; 2nd to last Friday of every month
		thursday -1 november	00:00-24:00		; Last Thursday in November of every year
		}

define timeperiod{
		timeperiod_name		misc-date-ranges
		alias			Misc Date Ranges
		2007-01-01 - 2008-02-01	00:00-24:00		; January 1st, 2007 to February 1st, 2008
		monday 3 - thursday 4	00:00-24:00		; 3rd Monday to 4th Thursday of every month
		day 1 - 15		00:00-24:00		; 1st to 15th day of every month
		day 20 - -1		00:00-24:00		; 20th to the last day of every month
		july 10 - 15		00:00-24:00		; July 10th to July 15th of every year
		april 10 - may 15		00:00-24:00		; April 10th to May 15th of every year
		tuesday 1 april - friday 2 may	00:00-24:00	; 1st Tuesday in April to 2nd Friday in May of every year
		}

define timeperiod{
		timeperiod_name		misc-skip-ranges
		alias			Misc Skip Ranges
		2007-01-01 - 2008-02-01 / 3		00:00-24:00	; Every 3 days from January 1st, 2007 to February 1st, 2008
		2008-04-01 / 7			00:00-24:00	; Every 7 days from April 1st, 2008 (continuing forever)
		monday 3 - thursday 4 / 2		00:00-24:00	; Every other day from 3rd Monday to 4th Thursday of every month
		day 1 - 15 / 5			00:00-24:00	; Every 5 days from the 1st to the 15th day of every month
		july 10 - 15 / 2			00:00-24:00	; Every other day from July 10th to July 15th of every year
		tuesday 1 april - friday 2 may / 6	00:00-24:00	; Every 6 days from the 1st Tuesday in April to the 2nd Friday in May of every year

2.5 主配置文件nagios.cfg的配置

       将以上4个步骤所配置的文件,通过cfg_file加上cfg_dir添加到/usr/local/nagios/etc/nagios.cfg文件中,具体的可以参考该文件中已有的配置,这里就不赘述了。完成上面所有的配置,再重启对应nagios以及插件的服务,即可在nagiosweb端看到配置的成果。

3.Nagios远程监控Linux/UNIX主机配置

本节以监控本地的常见服务为例,只做最简单的配置,如需更为全面的监控,还需继续研究配置文件。

1.配置concat.cfg文件

[root@monitors objects]# vi contacts.cfg 
define contact{
contact_name                    nagiosadminnn
use                             generic-contact
alias                           Nagios Admin
host_notifications_enabled              1
service_notifications_enabled   1
service_notification_period     24x7
host_notification_period        24x7
service_notification_options    w,u,c,r
host_notification_options       d,u,r
service_notification_commands   notify-service-by-email
host_notification_commands      notify-host-by-email
email                           88414341@qq.com
}
define contact{
contact_name                    nagiosadminkk
use                             generic-contact
alias                           Nagios Admin
host_notifications_enabled              1
service_notifications_enabled   1
service_notification_period     24x7
host_notification_period        24x7
service_notification_options    w,u,c,r
host_notification_options       d,u,r
service_notification_commands   notify-service-by-email
host_notification_commands      notify-host-by-email
email                           nnwan0110@163.com
}

# CONTACT GROUPS
define contactgroup{
contactgroup_name       admins
alias                   Nagios Administrators
members                 nagiosadminnn,nagiosadminkk
}

     后面提及的邮件通知涉及到contact里面配置的notify-service-by-email、 notify-host-by-email,其命令格式可以参见command.cfg.

2.配置hosts.cfg

[root@monitors ~]# vi /usr/local/nagios/etc/objects/hosts.cfg

# Define a host for the remote machine

define host{
          host_name monitors
          alias monitor-server
          use linux-server
          address 172.16.56.131
          max_check_attempts 5
          check_period 24x7
          check_interval 5
          retry_interval 1
          max_check_attempts 10
          check_command check-host-alive
          notification_period 24x7
          notification_interval 30
          notification_options d,r
          contact_groups admins
         }
# Define an optional hostgroup for Linux machines

define hostgroup{
        hostgroup_name     local-linux-servers ; The name of the hostgroup
        alias           Linux Servers ; Long name of the group
        members         *    ; Comma separated list of hosts that belong to this group
        }

3.配置services.cfg

[root@monitors~]#vi/usr/local/nagios/etc/objects/linuxserver.cfg
#Define a service to"ping"the local machine
define service{
use local-service;Name of service template to use
host_name monitors
service_description PING
check_command check_ping!100.0,20%!500.0,60%
contact_groups admins
}
#Define a service to check the disk space of the root partition on the local machine.Warning if<20%free,critical if<10%free space on partition.
define service{
use local-service;Name of service template to use
host_name monitors
service_description Root Partition
check_command check_local_disk!20%!10%!/
contact_groups admins
}
#Define a service to check the number of currently logged in users on the local machine.Warning if>20 users,critical if>50 users.
define service{
use local-service;Name of service template to use
host_name monitors
service_description Current Users
check_command check_local_users!20!50
contact_groups admins
}
#Define a service to check the number of currently running procs on the local machine.Warning if>250 processes,critical if>400 processes.
define service{
use local-service;Name of service template to use
host_name monitors
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
contact_groups admins
}
#Define a service to check the load on the local machine.
define service{
use local-service;Name of service template to use
host_name monitors
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
contact_groups admins
}
#Define a service to check the swap usage the local machine.Critical if less than 10%of swap is free,warning if less than 20%is free
define service{
use local-service;Name of service template to use
host_name monitors
service_description Swap Usage
check_command check_local_swap!20!10
contact_groups admins
}
#Define a service to check SSH on the local machine.
#Disable notifications for this service by default,as not all users may have SSH enabled.
define service{
use local-service;Name of service template to use
host_name monitors
service_description SSH
check_command check_ssh
notifications_enabled 0
contact_groups admins
}
#Define a service to check HTTP on the local machine.
#Disable notifications for this service by default,as not all users may have HTTP enabled.
define service{
use local-service;Name of service template to use
host_name monitors
service_description HTTP
check_command check_http
notifications_enabled 0
contact_groups admins
}
4.配置主配置文件

[root@monitors ~]# vi /usr/local/nagios/etc/nagios.cfg 
#definitions for monitoring the remote(linux/unix)host
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg

#definitions for monitoring the remote(linux/unix)host services
cfg_file=/usr/local/nagios/etc/objects/linuxserver.cfg

5.检验配置是否正确

[root@monitors ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
      注意: 如果按照官方文档的样例来配置contact,可能会出现如下的问题:

  原因是commands.cfg里没有定义这两个命令,解决方法就显而易见,这里因为command.cfg文件是有默认的hostservice的异常通知命令(本文用的默认的,无需更改),我们就直接改掉contact.cfg中的contact配置即可,无需重新对这2个命令作新定义:

service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-emai

6.查看配置效果

  如果上一步没有error的话,那么重启nagioshttpd服务:

[root@monitors ~]# /etc/init.d/nagios restart
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.
[root@monitors ~]# /etc/init.d/httpd restart
停止 httpd:                                          [确定]
正在启动 httpd:httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName                     [确定]

  登录http://172.16.56.131/nagios/,即可以查看到主机的运行情况。


     发现警告,赶紧开始着手解决问题吧奋斗

     附本文参阅Nagios官方文档编写,后续将继续完善,不足之处欢迎批评指正!












评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值