一、配置详情:
1.1 监控频次
监控频次:300s/次
1.2 监控页面查看地址
WEB监控页面地址:http://XXX:2812/
需要账号密码登录
1.3系统监控项配置详情:
check system myhost.mydomain.tld
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if swap usage > 25% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
1.4 事件服务器监控项详情:
# Event Server
check process eventserver
matching "Console eventserver"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh stop"
if cpu usage > 95% for 10 cycles then restart
1.5引擎服务器监控项详情:
# Engine
check process pioengine
matching "Console deploy"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"
if cpu usage > 95% for 10 cycles then restart
1.6 pioengine-http崩溃时监控项详情:
check program pioengine-http with path "/data/monit/monit_PredictionIO/check_engine.sh"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"
if status != 1
then restart
1.7 邮件提醒:
set mailserver XXX
username "XXX" password "XXX"
set mail-format { from:webmaster@example.com }
set alert XXX
set mail-format {
from: monit@$HOST
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION#
Your faithful employee,
Monit
}
二、监控页面:
说明:
Process:监控了事件服务器和推荐引擎服务器,目前定值为CPU超过95%会报警
Program:监控了有些情况下,进程正在运行,但是引擎已经关闭。如果PredictionIO使用的Akka HTTP REST API崩溃,引擎进程将继续,但是在查询时引擎将失败。此时会报警并重启服务。
三、监控详情:
Parameter | Value |
Parameter | Value |
Monit ID | ed652ace7517e5334c830b732eb324df |
Host | myhost.mydomain.tld |
Process id | 119966 |
Effective user running Monit | root |
Controlfile | /etc/monitrc |
Logfile | /var/log/monit.log |
Pidfile | /run/monit.pid |
State file | /root/.monit.state |
Debug | False |
Log | True |
Use syslog | False |
Mail server(s) | XXX:25 |
Default mail from | monit@$HOST |
Default mail subject | monit alert -- $EVENT $SERVICE |
Default mail message | $EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION# Your faithful employee, Monit |
Limit for Send/Expect buffer | 256 B |
Limit for file content buffer | 512 B |
Limit for HTTP content buffer | 1 MB |
Limit for program output | 512 B |
Limit for network timeout | 5 s |
Limit for check program timeout | 5 m |
Limit for service stop timeout | 30 s |
Limit for service start timeout | 30 s |
Limit for service restart timeout | 30 s |
On reboot | start |
Poll time | 300 seconds with start delay 0 seconds |
httpd bind address | Any/All |
httpd portnumber | 2812 |
httpd signature | True |
httpd auth. style | Basic Authentication and Host/Net allow list |
Alert mail to | |
Alert on | All events |
大数据、数据分析、爬虫群: 《453908562》