服务监控面板:pig-monitor Grafana自定义仪表盘全指南
【免费下载链接】pig 项目地址: https://gitcode.com/gh_mirrors/pig/pig
引言:微服务监控的痛点与解决方案
在分布式系统架构中,服务监控面临三大核心挑战:服务节点分散导致监控盲区、指标碎片化难以统一分析、告警响应滞后影响系统可用性。pig-monitor模块基于Spring Boot Admin构建,整合Grafana可视化能力,提供从服务注册发现到指标分析的全链路监控解决方案。本文将系统讲解如何从零开始配置监控环境、自定义业务仪表盘、设置智能告警规则,最终实现微服务架构的可观测性治理。
读完本文你将掌握:
- 基于Spring Boot Admin的服务监控平台搭建
- Grafana仪表盘自定义开发与指标配置
- 多维度监控指标(JVM/业务/接口)的采集方案
- 企业级告警策略设计与实施
技术架构:pig-monitor核心组件解析
监控体系架构图
核心依赖组件说明
| 组件名称 | 版本 | 作用 | 核心功能 |
|---|---|---|---|
| Spring Boot Admin | 2.6.x | 服务监控服务端 | 服务健康检查、元数据管理、日志查看 |
| Prometheus | 2.30.x | 时序数据存储 | 指标采集、聚合、持久化 |
| Grafana | 8.2.x | 可视化平台 | 自定义仪表盘、指标查询、告警配置 |
| Spring Security | 5.6.x | 安全认证 | 监控平台权限控制、用户管理 |
| Nacos Discovery | 2021.0.1.0 | 服务发现 | 动态服务注册与健康状态感知 |
环境搭建:从部署到基础配置
1. 监控服务部署流程
# 1. 克隆项目代码
git clone https://gitcode.com/gh_mirrors/pig/pig.git
cd pig
# 2. 构建监控模块
mvn clean package -pl pig-visual/pig-monitor -am -Dmaven.test.skip=true
# 3. 启动监控服务
cd pig-visual/pig-monitor
java -jar target/pig-monitor.jar --spring.profiles.active=prod
2. 核心配置文件解析(application.yml)
# 服务基础配置
server:
port: 5001
servlet:
context-path: /monitor
# Spring Boot Admin配置
spring:
boot:
admin:
ui:
title: "pig微服务监控平台"
brand: "<span>pig</span> Monitor"
monitor:
status-lifetime: 10000
connect-timeout: 5000
read-timeout: 5000
# Nacos注册配置
cloud:
nacos:
discovery:
server-addr: ${NACOS_SERVER:127.0.0.1:8848}
namespace: ${NACOS_NAMESPACE:public}
config:
server-addr: ${NACOS_SERVER:127.0.0.1:8848}
file-extension: yaml
namespace: ${NACOS_NAMESPACE:public}
# 监控指标暴露配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus,httptrace
metrics:
tags:
application: ${spring.application.name}
export:
prometheus:
enabled: true
3. 安全认证配置(SecurityConfig)
@Configuration
@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {
private final String adminContextPath;
public SecurityConfig(AdminServerProperties adminServerProperties) {
this.adminContextPath = adminServerProperties.getContextPath();
}
@Override
protected void configure(HttpSecurity http) throws Exception {
SavedRequestAwareAuthenticationSuccessHandler successHandler =
new SavedRequestAwareAuthenticationSuccessHandler();
successHandler.setTargetUrlParameter("redirectTo");
successHandler.setDefaultTargetUrl(adminContextPath + "/");
http.authorizeRequests()
// 静态资源允许访问
.antMatchers(adminContextPath + "/assets/**").permitAll()
// 登录页面允许访问
.antMatchers(adminContextPath + "/login").permitAll()
// 其他请求需要认证
.anyRequest().authenticated()
.and()
// 配置登录表单
.formLogin().loginPage(adminContextPath + "/login")
.successHandler(successHandler).and()
// 配置登出
.logout().logoutUrl(adminContextPath + "/logout")
.and()
// 配置CSRF保护
.csrf().csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse())
// 支持iframe
.and().headers().frameOptions().disable();
}
}
Grafana仪表盘开发实战
1. 数据采集与指标暴露
Spring Boot应用通过Actuator暴露监控端点,配合Micrometer实现指标采集:
<!-- pom.xml添加依赖 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
业务指标自定义实现:
@Component
public class BusinessMetricsCollector {
private final MeterRegistry meterRegistry;
private Counter orderSuccessCounter;
private Counter orderFailedCounter;
private Timer apiResponseTimer;
public BusinessMetricsCollector(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
initMetrics();
}
private void initMetrics() {
// 订单成功计数器
orderSuccessCounter = Counter.builder("business.order.success.count")
.description("订单成功总数")
.register(meterRegistry);
// 订单失败计数器
orderFailedCounter = Counter.builder("business.order.failed.count")
.description("订单失败总数")
.register(meterRegistry);
// API响应时间计时器
apiResponseTimer = Timer.builder("business.api.response.time")
.description("API接口响应时间")
.register(meterRegistry);
}
// 订单成功计数
public void incrementOrderSuccess() {
orderSuccessCounter.increment();
}
// 订单失败计数
public void incrementOrderFailed() {
orderFailedCounter.increment();
}
// 记录API响应时间
public <T> T recordApiResponseTime(Supplier<T> supplier) {
return Timer.start(meterRegistry).record(supplier);
}
}
2. 自定义仪表盘JSON配置
以下是订单业务监控仪表盘的核心配置(关键部分节选):
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 1,
"iteration": 1629267345463,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "8.2.0",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "increase(business.order.success.count[5m])",
"interval": "",
"legendFormat": "订单成功数",
"refId": "A"
},
{
"expr": "increase(business.order.failed.count[5m])",
"interval": "",
"legendFormat": "订单失败数",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "订单交易量趋势",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "订单数",
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "5s",
"schemaVersion": 27,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "订单业务监控仪表盘",
"uid": "order-business-dashboard",
"version": 1
}
3. 多维度监控仪表盘设计
JVM监控仪表盘
核心监控指标:
- 堆内存使用情况(used/committed/max)
- 非堆内存使用情况
- 垃圾回收次数与耗时
- 线程状态分布(活跃/等待/阻塞)
- 类加载统计(已加载/未加载)
接口性能监控仪表盘
关键指标与SQL查询:
| 指标名称 | PromQL查询 | 单位 | 阈值告警 |
|---|---|---|---|
| 接口平均响应时间 | avg(rate(http_server_requests_seconds_sum[5m]) / rate(http_server_requests_seconds_count[5m])) by (uri) | 秒 | >0.5 |
| 接口错误率 | sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count[5m])) by (uri) | % | >1 |
| 接口吞吐量 | sum(rate(http_server_requests_seconds_count[5m])) by (uri) | 请求/秒 | <10 |
告警策略设计与实施
1. Prometheus告警规则配置
groups:
- name: business_alerts
rules:
- alert: HighOrderFailureRate
expr: sum(rate(business.order.failed.count[5m])) / sum(rate(business.order.success.count[5m]) + rate(business.order.failed.count[5m])) > 0.05
for: 2m
labels:
severity: critical
service: order-service
annotations:
summary: "订单失败率过高"
description: "订单失败率已连续2分钟超过5% (当前值: {{ $value }})"
value: "{{ $value | humanizePercentage }}"
- alert: ApiResponseTimeSlow
expr: histogram_quantile(0.95, sum(rate(business.api.response.time_seconds_bucket[5m])) by (le, api)) > 1
for: 5m
labels:
severity: warning
service: api-service
annotations:
summary: "API响应时间缓慢"
description: "API {{ $labels.api }} 95%响应时间超过1秒 (当前值: {{ $value | humanizeDuration }})"
2. 告警通知渠道配置
route:
receiver: 'default-receiver'
group_by: ['alertname', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'default-receiver'
email_configs:
- to: 'admin@example.com'
send_resolved: true
smarthost: 'smtp.example.com:587'
from: 'monitor@example.com'
auth_username: 'monitor@example.com'
auth_password: 'password'
auth_identity: 'monitor@example.com'
webhook_configs:
- url: 'http://dingtalk-webhook-service:8080/send'
send_resolved: true
高级特性:自定义业务监控指标
1. 方法执行耗时监控
使用AOP实现无侵入式方法耗时监控:
@Aspect
@Component
public class MethodExecutionTimeAspect {
private final MeterRegistry meterRegistry;
public MethodExecutionTimeAspect(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
@Around("@annotation(MonitorMethodTime)")
public Object monitorMethodExecutionTime(ProceedingJoinPoint joinPoint) throws Throwable {
MethodSignature signature = (MethodSignature) joinPoint.getSignature();
MonitorMethodTime annotation = signature.getMethod().getAnnotation(MonitorMethodTime.class);
String metricName = annotation.metricName();
if (StringUtils.isEmpty(metricName)) {
metricName = "method.execution.time." + signature.getDeclaringTypeName() + "." + signature.getName();
}
Timer.Sample sample = Timer.start(meterRegistry);
try {
return joinPoint.proceed();
} finally {
sample.stop(meterRegistry.timer(metricName,
"class", signature.getDeclaringTypeName(),
"method", signature.getName()));
}
}
}
// 自定义注解
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface MonitorMethodTime {
String metricName() default "";
}
// 使用示例
@Service
public class OrderServiceImpl implements OrderService {
@Override
@MonitorMethodTime(metricName = "business.order.create.time")
public OrderVO createOrder(OrderDTO orderDTO) {
// 订单创建逻辑
}
}
2. 业务状态监控仪表盘
以电商库存监控为例,实现多级库存预警仪表盘:
部署与运维最佳实践
1. Docker容器化部署
pig-monitor Dockerfile:
FROM openjdk:8-jdk-alpine
WORKDIR /app
COPY target/pig-monitor.jar app.jar
EXPOSE 5001
ENTRYPOINT ["java", "-jar", "app.jar", "--spring.profiles.active=prod"]
Docker Compose编排:
version: '3.8'
services:
pig-monitor:
build: ./pig-visual/pig-monitor
ports:
- "5001:5001"
environment:
- NACOS_SERVER=192.168.1.100:8848
- NACOS_NAMESPACE=prod
restart: always
networks:
- pig-network
prometheus:
image: prom/prometheus:v2.30.3
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
ports:
- "9090:9090"
restart: always
networks:
- pig-network
grafana:
image: grafana/grafana:8.2.0
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=secret
restart: always
networks:
- pig-network
networks:
pig-network:
driver: bridge
volumes:
prometheus-data:
grafana-data:
2. 监控平台性能优化策略
-
指标采集优化
- 合理设置指标采集间隔(默认15s,非核心指标可延长至60s)
- 对高基数标签(如userId)进行哈希或采样处理
- 非必要指标通过白名单机制选择性暴露
-
存储优化
- Prometheus配置合理的数据保留时间(--storage.tsdb.retention.time=15d)
- 实现指标分层存储(近期数据本地存储,历史数据归档至对象存储)
- 使用remote_write/remote_read实现Prometheus集群联邦
-
查询性能优化
- 避免大范围时间区间的全量数据查询
- 对常用查询创建Recording Rule预计算
- 仪表盘按需加载,避免一次性加载过多面板
总结与展望
pig-monitor监控平台基于Spring Boot Admin和Grafana构建,通过本文介绍的方法,你已经掌握了从环境搭建到自定义仪表盘开发的完整流程,并能够针对业务场景设计多维度监控指标与告警策略。随着云原生技术的发展,未来监控体系将向以下方向演进:
- 可观测性融合:整合日志(Logging)、指标(Metrics)、追踪(Tracing)实现全链路可观测
- 智能化监控:基于机器学习的异常检测与根因分析
- ServiceMesh监控:服务网格层面的流量监控与性能分析
建议收藏本文作为监控平台开发参考手册,关注项目GitHub仓库获取最新功能更新。下一篇我们将深入探讨分布式追踪系统的设计与实现,敬请期待。
附录:常用Grafana插件推荐
| 插件名称 | 功能描述 | 适用场景 |
|---|---|---|
| Pie Chart | 饼图可视化 | 资源占比分析 |
| Clock Panel | 时钟与时间显示 | 多区域部署监控 |
| Alert List | 告警信息汇总 | 运维监控大屏 |
| Node Graph | 节点关系可视化 | 服务依赖分析 |
| Status Panel | 状态指示灯 | 服务健康状态展示 |
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



