Pinpoint集群证书过期监控:Prometheus告警
【免费下载链接】pinpoint 项目地址: https://gitcode.com/gh_mirrors/pin/pinpoint
问题背景
在分布式系统中,SSL/TLS证书过期会导致服务中断,影响Pinpoint全链路追踪功能。传统手动检查方式效率低下,通过Prometheus监控证书过期时间并配置自动告警,可实现提前预警,保障集群稳定性。
实现原理
Pinpoint集群通过以下流程实现证书监控:
- 指标暴露:Java Agent或Collector通过Micrometer注册证书过期指标
- 数据采集:Prometheus定期拉取暴露的指标数据
- 告警规则:Prometheus根据预定义规则判断证书是否即将过期
- 通知渠道:Alertmanager将告警信息发送至邮件、Slack等平台
环境准备
依赖组件
- Pinpoint Collector 2.5.0+(已集成Micrometer)
- Prometheus 2.30.0+
- Alertmanager 0.23.0+
关键依赖包
项目pom.xml已包含Prometheus指标导出依赖:
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
证书指标暴露
实现代码
在Collector模块中添加证书监控代码,通过Java SSL API读取证书信息并注册指标:
import io.micrometer.core.instrument.MeterRegistry;
import java.security.cert.X509Certificate;
import java.util.Date;
public class SslCertificateMonitor {
public SslCertificateMonitor(MeterRegistry registry) {
X509Certificate cert = loadCertificate(); // 加载服务端证书
long expirySeconds = (cert.getNotAfter().getTime() - new Date().getTime()) / 1000;
registry.gauge("ssl_certificate_expiry_seconds", expirySeconds);
registry.gauge("ssl_certificate_expiry_days", expirySeconds / 86400);
}
}
collector/src/main/java/com/navercorp/pinpoint/collector/monitor/SslCertificateMonitor.java
指标验证
启动Collector后访问Prometheus端点:
curl http://collector-host:9997/actuator/prometheus | grep ssl_certificate
预期输出:
ssl_certificate_expiry_seconds 2592000.0
ssl_certificate_expiry_days 30.0
Prometheus配置
采集配置
修改prometheus.yml添加Pinpoint目标:
scrape_configs:
- job_name: 'pinpoint'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['collector-1:9997', 'collector-2:9997']
prometheus/prometheus.yml
告警规则
创建证书过期告警规则:
groups:
- name: certificate_alerts
rules:
- alert: SslCertificateExpiringSoon
expr: ssl_certificate_expiry_days < 15
for: 5m
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon"
description: "Certificate on {{ $labels.instance }} will expire in {{ $value }} days"
- alert: SslCertificateExpired
expr: ssl_certificate_expiry_days <= 0
for: 1m
labels:
severity: critical
annotations:
summary: "SSL certificate expired"
description: "Certificate on {{ $labels.instance }} has expired"
prometheus/rules/certificate.rules.yml
监控面板
通过Grafana创建证书监控仪表盘,添加以下指标:
- 剩余天数趋势:
ssl_certificate_expiry_days - 证书状态分布:
count(ssl_certificate_expiry_days > 0)
告警通知
Alertmanager配置
配置Alertmanager发送邮件通知:
route:
receiver: 'email_notifications'
receivers:
- name: 'email_notifications'
email_configs:
- to: 'admin@example.com'
send_resolved: true
alertmanager/config.yml
告警处理流程
- 收到warning级别告警后,立即更新证书
- 执行证书轮换脚本:
./scripts/update-certificate.sh - 验证新证书生效:
curl -I https://pinpoint-collector:443 - 确认告警恢复
最佳实践
自动化证书更新
集成Certbot自动续期Let's Encrypt证书:
# 安装Certbot
apt-get install certbot
# 创建续期脚本
cat > /etc/cron.daily/renew-cert.sh << EOF
#!/bin/bash
certbot renew --quiet
cp /etc/letsencrypt/live/pinpoint.example.com/* /opt/pinpoint/cert/
systemctl restart pinpoint-collector
EOF
多环境监控
为开发/测试/生产环境配置不同告警阈值:
# 开发环境放宽告警阈值
- alert: SslCertificateExpiringSoon
expr: ssl_certificate_expiry_days < 7
labels:
environment: dev
常见问题
指标未采集到
- 检查Collector配置:
management.endpoints.web.exposure.include: prometheus - 验证证书文件权限:
ls -l /opt/pinpoint/cert/ - 查看应用日志:
grep "SslCertificateMonitor" /var/log/pinpoint/collector.log
告警延迟
调整Prometheus抓取间隔:
scrape_interval: 30s
scrape_timeout: 10s
参考文档
总结
通过Prometheus监控Pinpoint集群证书过期状态,结合自动化告警和处理流程,可有效避免证书过期导致的服务中断。建议定期演练证书更新流程,确保监控系统持续可靠运行。
【免费下载链接】pinpoint 项目地址: https://gitcode.com/gh_mirrors/pin/pinpoint
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考




