<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
通过Nagios监控Weblogic服务
1. 前言
本文主要介绍如何通过Nagios软件来监控Weblogic服务运行状况,其中主要包括Weblogic Server以及Weblogic JDBC Pool的运行状态。Nagios的插件中本身并不提供对于Weblogic服务监控的功能,所以要根据Nagios Plugin API编写自己的脚本,扩展其插件,完成我们所需要的功能。对于Weblogic运行状态信息的获得需通过JMX。
本文参考了Nagios3的官方文档中有关Nagios Plugin部分,以及Weblogic官方文档有关JMX和命令行部分,具体的Weblogic版本是8.14。
2. Nagios Plugin API概述
作为一个Nagios插件,无论你是用脚本(如shell、perl)还是用c编译后的可执行程序实现,它必须至少完成两件事,
1、退出时有一个返回值。
2、至少向标准输出设备(STDOUT)输出一行文本。
返回值定义:
Plugin Return Code |
Service State |
Host State |
0 |
OK |
UP |
1 |
WARNING |
UP or DOWN/UNREACHABLE* |
2 |
CRITICAL |
DOWN/UNREACHABLE |
3 |
UNKNOWN |
DOWN/UNREACHABLE |
输出文本至少要一行,其信息主要反映被监控应用、服务的状态。
例如:DISK OK - free space: / 3326 MB (56%);
3. 监控Weblogic的实现方法
对于Weblogic运行状况的获得,我们是通过命令行的方式实现的,通过调用Weblogic的weblogic.Admin类实现的。这个类的功能很强大,可以通过它管理和配置Weblogic。
以下介绍几个常用的命令写法。
1、获得server运行状态
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=ServerRuntime” |
2、获得JDBC Pool运行状态
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=
${SERVER_NAME},Type=JDBCConnectionPoolRuntime" |
将黄色标记部分的变量替换成相应真实环境值即可。
${URL} |
weblogic的URL,例如t3://192.168.1.2:7002 |
${USER_NAME} |
用户名 |
${PASS_WORD} |
密码 |
${DOMAIN_NAME} |
weblogic域的名称,如mydomain |
${SERVER_NAME} |
Server名 |
${POOL_NAME} |
JDBC Pool名称 |
在运行上述命令前需要设置JAVA_HOME,并且将$JAVA_HOME/bin添加到PATH中,将weblogic的weblogic81/server/lib/weblogic.jar包添加到CLASSPATH中。
4. 具体实现的shell脚本
有了监控的方法,根据Nagios Plugin API规则编写自己的shell实现脚本。具体的shell脚本如下:
check_wls.sh
#!/bin/ksh
#check_wls.sh --jdbcpool url username password domainname servername poolname
#check_wls.sh --server url username password domainname servername
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1749 $' | sed -e 's/[^0-9.]//g'`
. $PROGPATH/utils.sh
print_usage() {
echo "Usage:"
echo " $PROGNAME --jdbcpool url username password domainname servername poolname
echo " $PROGNAME --server url username password domainname servername
echo " $PROGNAME --help"
echo " $PROGNAME --version"
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "Check Weblogic status"
echo ""
echo "--jdbcpool url username password domainname servername poolname"
echo " Check Weblogic JDBC Pool"
echo "--server url username password domainname servername"
echo " Check Weblogic Server"
}
if [[ -z "$JAVA_HOME" ]]
then
echo "Please set JAVA_HOME!"
exit $STATE_UNKNOWN
fi
if [[ -z "$CLASSPATH" ]]
then
echo "Please set CLASSPATH!"
exit $STATE_UNKNOWN
else
echo $CLASSPATH | grep "weblogic.jar" | wc -l | read N
if [[ "$N" = "0" ]]
then
echo "Please add weblogic.jar to CLASSPATH!"
exit $STATE_UNKNOWN
fi
fi
PATH=$JAVA_HOME/bin:$PATH
export PATH
JDBC_TYPE="JDBCConnectionPoolRuntime"
SERVER_TYPE="ServerRuntime"
cmd="$1"
# Information options
case "$cmd" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
--version)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
esac
case "$cmd" in
--server)
URL=${2}
USER_NAME=${3}
PASS_WORD=${4}
DOMAIN_NAME=${5}
SERVER_NAME=${6}
SERVER_INFO="${DOMAIN_NAME}:${SERVER_NAME}"
RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=${SERVER_TYPE}"`
printf "${RE}" | grep ^"-" | wc -l | read N
if [[ "$N" -lt "1" ]]
then
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
if [[ "$N" -ge "1" ]]
then
HEALTH_STATE=""
RUN_STATE=""
#HealthState State
printf "${RE}" | while read NAME VALUE
do
#PoolState WaitingForConnectionCurrentCount State
#echo "NAME:${NAME} VALUE:${VALUE}"
case "${NAME}" in
HealthState:)
HEALTH_STATE=${VALUE}
;;
State:)
RUN_STATE=${VALUE}
;;
esac
done
#echo "HEALTH_STATE:${HEALTH_STATE}"
#echo "RUN_STATE:${RUN_STATE}"
HEALTH_STATE_INFO=${HEALTH_STATE}
echo ${HEALTH_STATE_INFO} | awk -F, '{ print $1 }' | awk -F: '{ print $2 }' | read HEALTH_STATE
#echo "HEALTH_STATE:${HEALTH_STATE}"
#HEALTH_OK HEALTH_WARN HEALTH_CRITICAL HEALTH_FAILED
if [[ "${RUN_STATE}" != "RUNNING" ]]
then
echo "CRITICAL - ${SERVER_INFO} State is ${RUN_STATE}"
exit $STATE_CRITICAL
fi
case "${HEALTH_STATE}" in
EALTH_OK)
;;
HEALTH_WARN)
echo "WARN - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_WARNING
;;
HEALTH_CRITICAL)
echo "CRITICAL - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_CRITICAL
;;
HEALTH_FAILED)
echo "FAILED - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_CRITICAL
;;
esac
fi
echo "OK - ${SERVER_INFO} State is ${RUN_STATE},HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_OK
;;
--jdbcpool)
URL=${2}
USER_NAME=${3}
PASS_WORD=${4}
DOMAIN_NAME=${5}
SERVER_NAME=${6}
POOL_NAME=${7}
POOL_INFO="${DOMAIN_NAME}:${SERVER_NAME}:${POOL_NAME}"
RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=${SERVER_NAME},Type=${JDBC_TYPE}"`
printf "${RE}" | grep ^"-" | wc -l | read N
if [[ "$N" -lt "1" ]]
then
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
if [[ "$N" -ge "1" ]]
then
POOL_STATE=""
WAIT_CNT=""
RUN_STATE=""
printf "${RE}" | while read NAME VALUE
do
#PoolState WaitingForConnectionCurrentCount State
#echo "NAME:${NAME} VALUE:${VALUE}"
case "${NAME}" in
PoolState:)
POOL_STATE=${VALUE}
;;
WaitingForConnectionCurrentCount:)
WAIT_CNT=${VALUE}
;;
State:)
RUN_STATE=${VALUE}
;;
esac
done
#echo "POOL_STATE:${POOL_STATE}"
#echo "WAIT_CNT:${WAIT_CNT}"
#echo "RUN_STATE:${RUN_STATE}"
if [[ "${POOL_STATE}" != "true" ]]
then
echo "CRITICAL - ${POOL_INFO} PoolState is ${POOL_STATE}"
exit $STATE_CRITICAL
fi
if [[ "${RUN_STATE}" != "Running" ]]
then
echo "CRITICAL - ${POOL_INFO} State is ${RUN_STATE}"
exit $STATE_CRITICAL
fi
if [[ "${WAIT_CNT}" -gt "0" ]]
then
echo "WARNING - ${POOL_INFO} WaitingForConnectionCurrentCount is ${WAIT_CNT}"
exit $STATE_WARNING
fi
else
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
echo "OK - ${POOL_INFO} State is ${RUN_STATE},PoolState is ${POOL_STATE},WaitingForConnectionCurrentCount is ${WAIT_CNT}"
exit $STATE_OK
;;
*)
print_usage
exit $STATE_UNKNOWN
;;
esac
|
5. 配置Weblogic监控
将check_wls.sh上传到Nagios软件的libexec目录下,并创建一个ln文件check_wls。
$ ln -s ./check_wls.sh ./check_wls |
在nrpe的配置文件中增加相关的命令定义。
Weblogic的具体配置信息如下,
${URL} |
t3://172.17.1.2:7001 |
${USER_NAME} |
weblogic |
${PASS_WORD} |
weblogic |
${DOMAIN_NAME} |
mydomain |
${SERVER_NAME} |
myserver |
${POOL_NAME} |
mypool |
编辑nrpe.cfg文件,增加如下内容,
$ vi ./nrpe.cfg
... .... ... .... ... .... ... .... ... .... ... ....
#check weblogic [check_wls]
command[check_wls_server_myserver]=/usr/local/nagios//libexec/check_wls --server t3://172.2.10.2:7001 weblogic weblogic mydomain myserver
command[check_wls_jdbcpool_mypool]=/usr/local/nagios//libexec/check_wls --jdbcpool t3://172.2.10.2:7001 weblogic weblogic mydomain myserver mypool |
在nrpe的启动脚本中添加环境变量(CLASSPATH、JAVA_HOME)
... .... ... .... ... .... ... .... ... .... ... ....
JAVA_HOME=/data/bea/bea/jdk142_05
export JAVA_HOME
CLASSPATH=/data/bea/bea/weblogic81/server/lib/weblogic.jar
export CLASSPATH
... .... ... .... ... .... ... .... ... .... ... .... |
编辑监控主机的nagios.cfg文件,添加如下内容。
$ vi ./nagios.cfg
... .... ... .... ... .... ... .... ... .... ... ....
# Define a host for the local machine
define host{
use linux-box ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name sol_172.2.10.2
alias sol_172.2.10.2
address 172.2.10.2
}
#the check_wls_server_myserver on the remote host.
define service{
use generic-service
host_name sol_172.2.10.2
service_description Weblogic Server myserver
check_command check_nrpe!check_wls_server_myserver
}
#the check_wls_jdbcpool_mypool on the remote host.
define service{
use generic-service
host_name sol_172.2.10.2
service_description Weblogic JDBCPool mypool
check_command check_nrpe!check_wls_jdbcpool_mypool
} |
验证配置是否正确。
重启监控主机上的nagios服务以及远程主机上的nrpe服务。
通过IE观察监控情况。
<?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" /><?xml:namespace prefix = w ns = "urn:schemas-microsoft-com:office:word" /> ![]() 图5.1 |
|
就此配置工作完成。
6. 结语
本文介绍了一种通过Nagios监控Weblogic应用的实现方式,按照Nagios Plugin API规则编写自己的Shell脚本实现该功能,并简单的描述了配置过程,提供了Shell源码。希望大家指正。