服务监控面板:pig-monitor Grafana自定义仪表盘全指南

服务监控面板:pig-monitor Grafana自定义仪表盘全指南

【免费下载链接】pig 【免费下载链接】pig 项目地址: https://gitcode.com/gh_mirrors/pig/pig

引言:微服务监控的痛点与解决方案

在分布式系统架构中,服务监控面临三大核心挑战:服务节点分散导致监控盲区指标碎片化难以统一分析告警响应滞后影响系统可用性。pig-monitor模块基于Spring Boot Admin构建,整合Grafana可视化能力,提供从服务注册发现到指标分析的全链路监控解决方案。本文将系统讲解如何从零开始配置监控环境、自定义业务仪表盘、设置智能告警规则,最终实现微服务架构的可观测性治理。

读完本文你将掌握:

  • 基于Spring Boot Admin的服务监控平台搭建
  • Grafana仪表盘自定义开发与指标配置
  • 多维度监控指标(JVM/业务/接口)的采集方案
  • 企业级告警策略设计与实施

技术架构:pig-monitor核心组件解析

监控体系架构图

mermaid

核心依赖组件说明

组件名称版本作用核心功能
Spring Boot Admin2.6.x服务监控服务端服务健康检查、元数据管理、日志查看
Prometheus2.30.x时序数据存储指标采集、聚合、持久化
Grafana8.2.x可视化平台自定义仪表盘、指标查询、告警配置
Spring Security5.6.x安全认证监控平台权限控制、用户管理
Nacos Discovery2021.0.1.0服务发现动态服务注册与健康状态感知

环境搭建:从部署到基础配置

1. 监控服务部署流程

# 1. 克隆项目代码
git clone https://gitcode.com/gh_mirrors/pig/pig.git
cd pig

# 2. 构建监控模块
mvn clean package -pl pig-visual/pig-monitor -am -Dmaven.test.skip=true

# 3. 启动监控服务
cd pig-visual/pig-monitor
java -jar target/pig-monitor.jar --spring.profiles.active=prod

2. 核心配置文件解析(application.yml)

# 服务基础配置
server:
  port: 5001
  servlet:
    context-path: /monitor

# Spring Boot Admin配置
spring:
  boot:
    admin:
      ui:
        title: "pig微服务监控平台"
        brand: "<span>pig</span> Monitor"
      monitor:
        status-lifetime: 10000
        connect-timeout: 5000
        read-timeout: 5000
  
  # Nacos注册配置
  cloud:
    nacos:
      discovery:
        server-addr: ${NACOS_SERVER:127.0.0.1:8848}
        namespace: ${NACOS_NAMESPACE:public}
      config:
        server-addr: ${NACOS_SERVER:127.0.0.1:8848}
        file-extension: yaml
        namespace: ${NACOS_NAMESPACE:public}

# 监控指标暴露配置
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus,httptrace
  metrics:
    tags:
      application: ${spring.application.name}
    export:
      prometheus:
        enabled: true

3. 安全认证配置(SecurityConfig)

@Configuration
@EnableWebSecurity
public class SecurityConfig extends WebSecurityConfigurerAdapter {

    private final String adminContextPath;

    public SecurityConfig(AdminServerProperties adminServerProperties) {
        this.adminContextPath = adminServerProperties.getContextPath();
    }

    @Override
    protected void configure(HttpSecurity http) throws Exception {
        SavedRequestAwareAuthenticationSuccessHandler successHandler = 
            new SavedRequestAwareAuthenticationSuccessHandler();
        successHandler.setTargetUrlParameter("redirectTo");
        successHandler.setDefaultTargetUrl(adminContextPath + "/");

        http.authorizeRequests()
            // 静态资源允许访问
            .antMatchers(adminContextPath + "/assets/**").permitAll()
            // 登录页面允许访问
            .antMatchers(adminContextPath + "/login").permitAll()
            // 其他请求需要认证
            .anyRequest().authenticated()
            .and()
            // 配置登录表单
            .formLogin().loginPage(adminContextPath + "/login")
            .successHandler(successHandler).and()
            // 配置登出
            .logout().logoutUrl(adminContextPath + "/logout")
            .and()
            // 配置CSRF保护
            .csrf().csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse())
            // 支持iframe
            .and().headers().frameOptions().disable();
    }
}

Grafana仪表盘开发实战

1. 数据采集与指标暴露

Spring Boot应用通过Actuator暴露监控端点,配合Micrometer实现指标采集:

<!-- pom.xml添加依赖 -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

业务指标自定义实现:

@Component
public class BusinessMetricsCollector {
    private final MeterRegistry meterRegistry;
    private Counter orderSuccessCounter;
    private Counter orderFailedCounter;
    private Timer apiResponseTimer;

    public BusinessMetricsCollector(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        initMetrics();
    }

    private void initMetrics() {
        // 订单成功计数器
        orderSuccessCounter = Counter.builder("business.order.success.count")
                .description("订单成功总数")
                .register(meterRegistry);
                
        // 订单失败计数器
        orderFailedCounter = Counter.builder("business.order.failed.count")
                .description("订单失败总数")
                .register(meterRegistry);
                
        // API响应时间计时器
        apiResponseTimer = Timer.builder("business.api.response.time")
                .description("API接口响应时间")
                .register(meterRegistry);
    }

    // 订单成功计数
    public void incrementOrderSuccess() {
        orderSuccessCounter.increment();
    }
    
    // 订单失败计数
    public void incrementOrderFailed() {
        orderFailedCounter.increment();
    }
    
    // 记录API响应时间
    public <T> T recordApiResponseTime(Supplier<T> supplier) {
        return Timer.start(meterRegistry).record(supplier);
    }
}

2. 自定义仪表盘JSON配置

以下是订单业务监控仪表盘的核心配置(关键部分节选):

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 1,
  "iteration": 1629267345463,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "Prometheus",
      "fieldConfig": {
        "defaults": {
          "links": []
        },
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 2,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "8.2.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "increase(business.order.success.count[5m])",
          "interval": "",
          "legendFormat": "订单成功数",
          "refId": "A"
        },
        {
          "expr": "increase(business.order.failed.count[5m])",
          "interval": "",
          "legendFormat": "订单失败数",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "订单交易量趋势",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "订单数",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "5s",
  "schemaVersion": 27,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "订单业务监控仪表盘",
  "uid": "order-business-dashboard",
  "version": 1
}

3. 多维度监控仪表盘设计

JVM监控仪表盘

核心监控指标:

  • 堆内存使用情况(used/committed/max)
  • 非堆内存使用情况
  • 垃圾回收次数与耗时
  • 线程状态分布(活跃/等待/阻塞)
  • 类加载统计(已加载/未加载)

mermaid

接口性能监控仪表盘

关键指标与SQL查询:

指标名称PromQL查询单位阈值告警
接口平均响应时间avg(rate(http_server_requests_seconds_sum[5m]) / rate(http_server_requests_seconds_count[5m])) by (uri)>0.5
接口错误率sum(rate(http_server_requests_seconds_count{status=~"5.."}[5m])) / sum(rate(http_server_requests_seconds_count[5m])) by (uri)%>1
接口吞吐量sum(rate(http_server_requests_seconds_count[5m])) by (uri)请求/秒<10

告警策略设计与实施

1. Prometheus告警规则配置

groups:
- name: business_alerts
  rules:
  - alert: HighOrderFailureRate
    expr: sum(rate(business.order.failed.count[5m])) / sum(rate(business.order.success.count[5m]) + rate(business.order.failed.count[5m])) > 0.05
    for: 2m
    labels:
      severity: critical
      service: order-service
    annotations:
      summary: "订单失败率过高"
      description: "订单失败率已连续2分钟超过5% (当前值: {{ $value }})"
      value: "{{ $value | humanizePercentage }}"

  - alert: ApiResponseTimeSlow
    expr: histogram_quantile(0.95, sum(rate(business.api.response.time_seconds_bucket[5m])) by (le, api)) > 1
    for: 5m
    labels:
      severity: warning
      service: api-service
    annotations:
      summary: "API响应时间缓慢"
      description: "API {{ $labels.api }} 95%响应时间超过1秒 (当前值: {{ $value | humanizeDuration }})"

2. 告警通知渠道配置

route:
  receiver: 'default-receiver'
  group_by: ['alertname', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  
receivers:
- name: 'default-receiver'
  email_configs:
  - to: 'admin@example.com'
    send_resolved: true
    smarthost: 'smtp.example.com:587'
    from: 'monitor@example.com'
    auth_username: 'monitor@example.com'
    auth_password: 'password'
    auth_identity: 'monitor@example.com'
  
  webhook_configs:
  - url: 'http://dingtalk-webhook-service:8080/send'
    send_resolved: true

高级特性:自定义业务监控指标

1. 方法执行耗时监控

使用AOP实现无侵入式方法耗时监控:

@Aspect
@Component
public class MethodExecutionTimeAspect {
    private final MeterRegistry meterRegistry;

    public MethodExecutionTimeAspect(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    @Around("@annotation(MonitorMethodTime)")
    public Object monitorMethodExecutionTime(ProceedingJoinPoint joinPoint) throws Throwable {
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        MonitorMethodTime annotation = signature.getMethod().getAnnotation(MonitorMethodTime.class);
        
        String metricName = annotation.metricName();
        if (StringUtils.isEmpty(metricName)) {
            metricName = "method.execution.time." + signature.getDeclaringTypeName() + "." + signature.getName();
        }
        
        Timer.Sample sample = Timer.start(meterRegistry);
        try {
            return joinPoint.proceed();
        } finally {
            sample.stop(meterRegistry.timer(metricName, 
                "class", signature.getDeclaringTypeName(), 
                "method", signature.getName()));
        }
    }
}

// 自定义注解
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface MonitorMethodTime {
    String metricName() default "";
}

// 使用示例
@Service
public class OrderServiceImpl implements OrderService {
    @Override
    @MonitorMethodTime(metricName = "business.order.create.time")
    public OrderVO createOrder(OrderDTO orderDTO) {
        // 订单创建逻辑
    }
}

2. 业务状态监控仪表盘

以电商库存监控为例,实现多级库存预警仪表盘:

mermaid

部署与运维最佳实践

1. Docker容器化部署

pig-monitor Dockerfile:

FROM openjdk:8-jdk-alpine
WORKDIR /app
COPY target/pig-monitor.jar app.jar
EXPOSE 5001
ENTRYPOINT ["java", "-jar", "app.jar", "--spring.profiles.active=prod"]

Docker Compose编排:

version: '3.8'
services:
  pig-monitor:
    build: ./pig-visual/pig-monitor
    ports:
      - "5001:5001"
    environment:
      - NACOS_SERVER=192.168.1.100:8848
      - NACOS_NAMESPACE=prod
    restart: always
    networks:
      - pig-network

  prometheus:
    image: prom/prometheus:v2.30.3
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    restart: always
    networks:
      - pig-network

  grafana:
    image: grafana/grafana:8.2.0
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=secret
    restart: always
    networks:
      - pig-network

networks:
  pig-network:
    driver: bridge

volumes:
  prometheus-data:
  grafana-data:

2. 监控平台性能优化策略

  1. 指标采集优化

    • 合理设置指标采集间隔(默认15s,非核心指标可延长至60s)
    • 对高基数标签(如userId)进行哈希或采样处理
    • 非必要指标通过白名单机制选择性暴露
  2. 存储优化

    • Prometheus配置合理的数据保留时间(--storage.tsdb.retention.time=15d)
    • 实现指标分层存储(近期数据本地存储,历史数据归档至对象存储)
    • 使用remote_write/remote_read实现Prometheus集群联邦
  3. 查询性能优化

    • 避免大范围时间区间的全量数据查询
    • 对常用查询创建Recording Rule预计算
    • 仪表盘按需加载,避免一次性加载过多面板

总结与展望

pig-monitor监控平台基于Spring Boot Admin和Grafana构建,通过本文介绍的方法,你已经掌握了从环境搭建到自定义仪表盘开发的完整流程,并能够针对业务场景设计多维度监控指标与告警策略。随着云原生技术的发展,未来监控体系将向以下方向演进:

  1. 可观测性融合:整合日志(Logging)、指标(Metrics)、追踪(Tracing)实现全链路可观测
  2. 智能化监控:基于机器学习的异常检测与根因分析
  3. ServiceMesh监控:服务网格层面的流量监控与性能分析

建议收藏本文作为监控平台开发参考手册,关注项目GitHub仓库获取最新功能更新。下一篇我们将深入探讨分布式追踪系统的设计与实现,敬请期待。

附录:常用Grafana插件推荐

插件名称功能描述适用场景
Pie Chart饼图可视化资源占比分析
Clock Panel时钟与时间显示多区域部署监控
Alert List告警信息汇总运维监控大屏
Node Graph节点关系可视化服务依赖分析
Status Panel状态指示灯服务健康状态展示

【免费下载链接】pig 【免费下载链接】pig 项目地址: https://gitcode.com/gh_mirrors/pig/pig

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值