监控工具之基础设施监控（部署配置篇）_部署zabbix或nagios进行基础设施监控-优快云博客

一、Zabbix 部署与配置

1.1 Zabbix 架构与组件

Zabbix 采用 Client/Server 架构，核心组件包括：

Zabbix Server：接收监控数据，处理告警

Zabbix Agent：安装在被监控主机，收集数据

Zabbix Proxy：分布式监控场景下的中间节点

Zabbix Web：Web 管理界面

Database：存储监控数据（MySQL/PostgreSQL/Oracle）

1.2 Zabbix Server 安装（CentOS 7）

1.2.1 安装前准备

# 关闭SELinux
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
setenforce 0

# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 安装依赖
yum install -y gcc gcc-c++ make wget vim net-snmp net-snmp-devel curl curl-devel

1.2.2 安装数据库

# 安装MySQL
yum install -y mariadb mariadb-server
systemctl start mariadb
systemctl enable mariadb

# 初始化数据库
mysql_secure_installation

# 创建Zabbix数据库和用户
mysql -u root -p
CREATE DATABASE zabbix CHARACTER SET utf8 COLLATE utf8_bin;
GRANT ALL PRIVILEGES ON zabbix.* TO zabbix@localhost IDENTIFIED BY 'zabbix_password';
FLUSH PRIVILEGES;
exit

1.2.3 安装 Zabbix Server

# 添加Zabbix源
rpm -Uvh https://repo.zabbix.com/zabbix/5.4/rhel/7/x86_64/zabbix-release-5.4-1.el7.noarch.rpm
yum clean all

# 安装Zabbix组件
yum install -y zabbix-server-mysql zabbix-web-mysql zabbix-apache-conf zabbix-agent

# 导入初始数据
zcat /usr/share/doc/zabbix-server-mysql*/create.sql.gz | mysql -uzabbix -p zabbix

1.2.4 配置 Zabbix Server

# 修改配置文件
vim /etc/zabbix/zabbix_server.conf
DBPassword=zabbix_password  # 设置数据库密码

# 配置PHP时区
vim /etc/httpd/conf.d/zabbix.conf
php_value date.timezone Asia/Shanghai

# 启动服务
systemctl start zabbix-server httpd zabbix-agent
systemctl enable zabbix-server httpd zabbix-agent

1.2.5 Web 界面初始化

访问 http://server_ip/zabbix
按照向导完成安装（数据库连接信息：用户 zabbix，密码 zabbix_password）
初始登录：用户名 Admin，密码 zabbix

1.3 Zabbix Agent 部署

1.3.1 Linux 客户端安装

# 安装Zabbix Agent
rpm -Uvh https://repo.zabbix.com/zabbix/5.4/rhel/7/x86_64/zabbix-release-5.4-1.el7.noarch.rpm
yum install -y zabbix-agent

# 配置Agent
vim /etc/zabbix/zabbix_agentd.conf
Server=192.168.1.100  # Zabbix Server IP
ServerActive=192.168.1.100  # Zabbix Server IP
Hostname=web-server-01  # 客户端主机名

# 启动Agent
systemctl start zabbix-agent
systemctl enable zabbix-agent

1.3.2 Windows 客户端安装

下载 Zabbix Agent Windows 版本（https://www.zabbi x.com /down load_agent s）
解压到 C:\zabbix-agent
修改 zabbix_agentd.win.conf：

Server=192.168.1.100
ServerActive=192.168.1.100
Hostname=win-server-01

4. 安装为服务：

zabbix_agentd.exe --install --config C:\zabbix-agent\zabbix_agentd.win.conf
net start "Zabbix Agent"

1.4 Zabbix 监控配置

1.4.1 添加主机

登录 Zabbix Web 界面
导航至 Configuration → Hosts → Create host
填写主机信息：

- Host name：与 Agent 配置的 Hostname 一致

- Groups：选择或创建主机组（如 Linux servers）

- Interfaces：添加 Agent 接口（IP 地址和端口 10050）

4. 点击 Templates → Select → 搜索 Template OS Linux → Add

5. 点击 Add 完成主机添加

1.4.2 自定义监控项

以监控 Nginx 连接数为例：

a.在被监控主机上创建脚本：

cat > /usr/local/bin/nginx_connections.sh << EOF
#!/bin/bash
netstat -ant | grep :80 | grep -c ESTABLISHED
EOF
chmod +x /usr/local/bin/nginx_connections.sh

b.在 Zabbix Agent 配置中添加：

vim /etc/zabbix/zabbix_agentd.conf
UserParameter=nginx.connections,/usr/local/bin/nginx_connections.sh

c.在 Zabbix Web 界面创建监控项：

- Configuration → Hosts → 选择主机 → Items → Create item

- Name：Nginx established connections

- Key：nginx.connections

- Type：Zabbix agent

- Type of information：Numeric (unsigned)

- Update interval：30s

d.创建触发器：

- Configuration → Hosts → 选择主机 → Triggers → Create trigger

- Name：Nginx connections high

- Expression：{web-server-01:nginx.connections.last()} > 1000

- Severity：Warning

二、Prometheus + Grafana 部署与配置

2.1 Prometheus 架构与组件

Prometheus 生态系统包含多个组件：

Prometheus Server：核心组件，负责数据采集和存储

Exporters：数据采集器，针对不同服务

Pushgateway：接收短生命周期任务的数据

Alertmanager：处理告警

Grafana：数据可视化

2.2 Prometheus 安装（Linux）

# 创建用户
useradd -M -s /sbin/nologin prometheus

# 下载并解压
wget https://github.com/prometheus/prometheus/releases/download/v2.37.0/prometheus-2.37.0.linux-amd64.tar.gz
tar xvf prometheus-2.37.0.linux-amd64.tar.gz
mv prometheus-2.37.0.linux-amd64 /usr/local/prometheus

# 创建数据目录
mkdir -p /var/lib/prometheus
chown -R prometheus:prometheus /usr/local/prometheus /var/lib/prometheus

# 创建系统服务
cat > /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/prometheus/prometheus \
  --config.file=/usr/local/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus \
  --web.console.templates=/usr/local/prometheus/consoles \
  --web.console.libraries=/usr/local/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF

# 启动服务
systemctl start prometheus
systemctl enable prometheus

2.3 Node Exporter 安装（主机监控）

# 下载并安装
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/

# 创建系统服务
cat > /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/node_exporter \
  --collector.systemd \
  --collector.processes \
  --collector.netclass

[Install]
WantedBy=multi-user.target
EOF

# 启动服务
systemctl start node_exporter
systemctl enable node_exporter

2.4 配置 Prometheus 采集目标

编辑 Prometheus 配置文件：

vim /usr/local/prometheus/prometheus.yml

添加 node_exporter 采集配置：

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'nodes'
    static_configs:
    - targets: ['localhost:9100', '192.168.1.101:9100', '192.168.1.102:9100']

重启 Prometheus 使配置生效：

systemctl restart prometheus

访问 Prometheus Web 界面（http://server_ip:9090），在 Status → Targets 中可查看采集目标状态。

2.5 Grafana 安装与配置

# 安装Grafana
wget https://dl.grafana.com/oss/release/grafana-9.0.6-1.x86_64.rpm
yum install -y grafana-9.0.6-1.x86_64.rpm

# 启动服务
systemctl start grafana-server
systemctl enable grafana-server

# 开放防火墙端口（如果启用）
firewall-cmd --add-port=3000/tcp --permanent
firewall-cmd --reload

2.6 配置 Grafana 连接 Prometheus

登录 Grafana 后，点击 Add your first data source
选择 Prometheus
在 URL 字段输入 Prometheus 地址（如http://localhost:9090）
点击 Save & Test，显示 Data source is working 表示配置成功

2.7 导入 Grafana 仪表盘

访问 Grafana 仪表盘市场：Grafana dashboards | Grafana Labs
搜索适合的仪表盘（如 Node Exporter Full，ID: 1860）
在 Grafana 中，点击 Create → Import
输入仪表盘 ID，点击 Load
选择 Prometheus 数据源，点击 Import

三、Nagios 部署与配置

3.1 Nagios 架构

Nagios 是一款老牌开源监控工具，主要组件包括：

Nagios Core：核心引擎

Nagios Plugins：监控插件

Web Interface：Web 管理界面

NRPE：远程主机监控代理

3.2 Nagios 安装（CentOS 7）

# 安装依赖
yum install -y httpd php gcc glibc glibc-common make gettext automake autoconf wget openssl-devel

# 创建用户和组
useradd nagios
groupadd nagcmd
usermod -a -G nagcmd nagios
usermod -a -G nagcmd apache

# 下载并安装Nagios Core
cd /tmp
wget https://github.com/NagiosEnterprises/nagioscore/archive/nagios-4.4.14.tar.gz
tar xvf nagios-4.4.14.tar.gz
cd nagioscore-nagios-4.4.14

# 编译安装
./configure --with-command-group=nagcmd
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf

# 配置Nagios Web界面认证
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

# 安装Nagios插件
cd /tmp
wget https://nagios-plugins.org/download/nagios-plugins-2.4.6.tar.gz
tar xvf nagios-plugins-2.4.6.tar.gz
cd nagios-plugins-2.4.6

./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

# 启动服务
systemctl start httpd
systemctl enable httpd
systemctl start nagios
systemctl enable nagios

访问 Nagios Web 界面：http://server_ip/nagios，使用用户名 nagiosadmin 和设置的密码登录。

3.3 配置 Nagios 监控远程主机

以监控远程 Linux 主机为例：

a.在远程主机安装 NRPE 和插件：

yum install -y nrpe nagios-plugins-all

# 配置NRPE
vim /etc/nagios/nrpe.cfg
server_address=远程主机IP
allowed_hosts=Nagios服务器IP

# 启动NRPE
systemctl start nrpe
systemctl enable nrpe

b.在 Nagios 服务器上安装 NRPE 插件：

cd /tmp
wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-4.1.0/nrpe-4.1.0.tar.gz
tar xvf nrpe-4.1.0.tar.gz
cd nrpe-4.1.0
./configure
make all
make install

c.配置 Nagios 监控远程主机：

# 创建主机配置文件
vim /usr/local/nagios/etc/objects/remote_hosts.cfg

define host {
    use                     linux-server
    host_name               web-server-01
    alias                   Web Server 01
    address                 192.168.1.101
    max_check_attempts      5
    check_period            24x7
    notification_interval   30
    notification_period     24x7
}

define service {
    use                     generic-service
    host_name               web-server-01
    service_description     PING
    check_command           check_ping!100.0,20%!500.0,60%
}

define service {
    use                     generic-service
    host_name               web-server-01
    service_description     SSH
    check_command           check_ssh
    check_interval          5
    retry_interval          1
}

define service {
    use                     generic-service
    host_name               web-server-01
    service_description     HTTP
    check_command           check_http
    check_interval          5
    retry_interval          1
}

d.在 Nagios 主配置中包含新配置文件：

vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/remote_hosts.cfg

e.验证配置并重启 Nagios：

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
systemctl restart nagios

四、监控工具对比与选择建议

4.1 功能特性对比

特性	Zabbix	Prometheus + Grafana	Nagios
架构设计	C/S 架构，依赖数据库	时序数据库，拉取模式	插件式架构，C/S 模式
数据存储	关系型数据库（MySQL/PostgreSQL）	本地时序数据库，支持远程存储	文件或数据库
可视化能力	内置基础图表，功能有限	Grafana 提供强大可视化，支持多种图表	基础 Web 界面，图表简陋
告警机制	支持多级告警、多种通知方式	通过 Alertmanager，支持告警分组、抑制	基础告警，需自定义脚本扩展
扩展性	良好，支持自定义监控项和模板	优秀，丰富的 Exporter 生态	一般，依赖插件扩展
学习曲线	中等，配置相对复杂	较陡，需理解 PromQL 和指标模型	较陡，配置文件复杂
社区支持	活跃，中文资源丰富	非常活跃，更新快	稳定，资源相对较少
适用规模	中小到大型企业	中大型企业，云原生环境	中小型企业，传统环境
部署复杂度	中等	中等，组件较多	较高，需手动配置

4.2 性能对比

场景	Zabbix	Prometheus + Grafana	Nagios
监控节点数	支持数千节点	支持数万节点	适合数百节点
数据采集频率	最低 1 秒	最低 15 秒（可配置更低）	最低 1 分钟（可自定义）
资源消耗	中高（数据库为主）	中（内存和磁盘 IO）	低
高可用支持	支持，需额外配置	支持，联邦集群模式	需第三方工具（如 DRBD）
分布式监控	支持 Proxy 节点	支持联邦和远程写	支持分布式监控器

4.3 选型建议

a.中小企业传统环境：优先选择 Zabbix

- 理由：功能全面，部署相对简单，中文支持好，社区活跃

- 适合场景：服务器、网络设备、传统应用监控

b.云原生 / 容器环境：优先选择 Prometheus + Grafana

- 理由：天生适合容器监控，Kubernetes 生态紧密集成，时序数据处理高效

- 适合场景：Docker、K8s 集群、微服务架构

c.简单监控需求 / 老旧系统：可选择 Nagios

- 理由：轻量稳定，资源消耗低，历史悠久

- 适合场景：少量关键服务器监控，对资源占用敏感的环境

d.混合环境：可采用组合方案

- Zabbix + Prometheus：Zabbix 监控传统设备，Prometheus 监控容器

- 通过 API 实现数据互通，Grafana 统一展示

五、高级配置与扩展

5.1 Zabbix 高级配置

5.1.1 分布式监控（Zabbix Proxy）

当监控节点跨地域或数量庞大时，需部署 Zabbix Proxy：

# 安装Zabbix Proxy
yum install -y zabbix-proxy-mysql

# 配置数据库
mysql -u root -p
CREATE DATABASE zabbix_proxy CHARACTER SET utf8 COLLATE utf8_bin;
GRANT ALL PRIVILEGES ON zabbix_proxy.* TO zabbix@localhost IDENTIFIED BY 'proxy_password';
FLUSH PRIVILEGES;
exit

# 配置Zabbix Proxy
vim /etc/zabbix/zabbix_proxy.conf
Server=192.168.1.100  # Zabbix Server IP
Hostname=proxy-1
DBPassword=proxy_password
ConfigFrequency=3600  # 配置同步间隔(秒)

# 启动服务
systemctl start zabbix-proxy
systemctl enable zabbix-proxy

在 Zabbix Web 界面添加 Proxy：

Administration → Proxies → Create proxy

Proxy name：与配置文件中 Hostname 一致

Proxy mode：Active

点击 Add 完成配置

5.1.2 自动发现与自动注册

实现主机自动加入监控：

# Zabbix Server配置自动发现
vim /etc/zabbix/zabbix_server.conf
DiscovererUpdateFrequency=60  # 发现频率(秒)

# Web界面配置自动发现规则
Configuration → Discovery → Create discovery rule
- Name: Local network discovery
- IP range: 192.168.1.1-254
- Delay: 30m
- Checks: 添加ICMP ping和TCP 10050端口检查

# 配置自动注册动作
Configuration → Actions → Event source: Auto registration
- Create action: 命名为Auto register Linux hosts
- Conditions: Host metadata contains Linux
- Operations: 
  - Add host
  - Add to host group: Linux servers
  - Link to template: Template OS Linux

5.2 Prometheus 高级配置

5.2.1 联邦集群（Federation）

实现 Prometheus 分布式部署：

# 子Prometheus配置
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.1.101:9100', '192.168.1.102:9100']

# 主Prometheus配置（联邦）
scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job=~"node|prometheus"}'
    static_configs:
      - targets:
        - '192.168.1.110:9090'  # 子Prometheus 1
        - '192.168.1.111:9090'  # 子Prometheus 2

5.2.2 告警配置（Alertmanager）

# 安装Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
tar xvf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64 /usr/local/alertmanager

# 配置Alertmanager
cat > /usr/local/alertmanager/alertmanager.yml << EOF
global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alert@example.com'
  smtp_auth_username: 'alert@example.com'
  smtp_auth_password: 'password'
  smtp_require_tls: true

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email-notifications'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: 'admin@example.com'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
EOF

# 启动服务
cat > /etc/systemd/system/alertmanager.service << EOF
[Unit]
Description=Alertmanager
After=network.target

[Service]
User=prometheus
Group=prometheus
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target
EOF

systemctl start alertmanager
systemctl enable alertmanager

在 Prometheus 中配置 Alertmanager：

# prometheus.yml
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:9093'

rule_files:
  - "alert_rules.yml"

创建告警规则：

# alert_rules.yml
groups:
- name: host_alerts
  rules:
  - alert: HighCpuUsage
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% for 5 minutes (current value: {{ $value }})"

5.3 Nagios 高级配置

5.3.1 配置邮件告警

# 安装邮件组件
yum install -y sendmail mailx

# 配置通知命令
vim /usr/local/nagios/etc/objects/commands.cfg
define command {
    command_name    notify-host-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "Host Alert: $HOSTNAME$ is $HOSTSTATE$" $CONTACTEMAIL$
}

define command {
    command_name    notify-service-by-email
    command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" $CONTACTEMAIL$
}

# 配置联系人
vim /usr/local/nagios/etc/objects/contacts.cfg
define contact {
    contact_name            nagiosadmin
    use                     generic-contact
    alias                   Nagios Admin
    email                   admin@example.com
}