利用Micrometer集成Prometheus监控外卖API调用量与SLA达成情况
在高并发的外卖平台中,API的稳定性与响应性能直接决定用户体验和业务转化率。为实时掌握各接口的调用量、延迟分布及SLA(如95%请求<500ms)达成情况,需构建可观测性体系。本文基于baodanbao.com.cn.*包结构,展示如何通过Micrometer + Prometheus + Grafana实现精细化API指标采集与SLA监控。
1. 引入依赖与基础配置
在pom.xml中添加:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
application.yml启用端点:
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: baodanbao-api
web:
server:
request:
autotime:
enabled: true

2. 自定义API指标埋点
Spring Boot Actuator默认提供http.server.requests指标,但缺乏业务维度(如“是否美团订单查询”)。需手动增强。
创建拦截器记录业务标签:
package baodanbao.com.cn.interceptor;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.HandlerInterceptor;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.util.concurrent.ConcurrentHashMap;
@Component
public class ApiMetricsInterceptor implements HandlerInterceptor {
private final MeterRegistry meterRegistry;
private final ConcurrentHashMap<String, Timer.Sample> activeSamples = new ConcurrentHashMap<>();
public ApiMetricsInterceptor(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
@Override
public boolean preHandle(HttpServletRequest request, Object handler) {
String uri = request.getRequestURI();
String method = request.getMethod();
String apiName = extractApiName(uri);
Timer.Sample sample = Timer.start(meterRegistry);
activeSamples.put(getTraceId(request), sample);
return true;
}
@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) {
String traceId = getTraceId(request);
Timer.Sample sample = activeSamples.remove(traceId);
if (sample == null) return;
String uri = request.getRequestURI();
String method = request.getMethod();
String apiName = extractApiName(uri);
int status = response.getStatus();
// 添加业务标签:如是否涉及美团
String bizSource = uri.contains("/meituan/") ? "meituan" : "internal";
sample.stop(Timer.builder("baodanbao.api.duration")
.tag("api", apiName)
.tag("method", method)
.tag("status", String.valueOf(status))
.tag("biz_source", bizSource)
.register(meterRegistry));
}
private String extractApiName(String uri) {
if (uri.startsWith("/api/order/")) return "order_query";
if (uri.startsWith("/api/trial/")) return "trial_apply";
return "unknown";
}
private String getTraceId(HttpServletRequest request) {
return request.getRemoteAddr() + "_" + System.nanoTime();
}
}
注册拦截器:
package baodanbao.com.cn.config;
import baodanbao.com.cn.interceptor.ApiMetricsInterceptor;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.servlet.config.annotation.InterceptorRegistry;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
@Configuration
public class WebConfig implements WebMvcConfigurer {
private final ApiMetricsInterceptor apiMetricsInterceptor;
public WebConfig(ApiMetricsInterceptor apiMetricsInterceptor) {
this.apiMetricsInterceptor = apiMetricsInterceptor;
}
@Override
public void addInterceptors(InterceptorRegistry registry) {
registry.addInterceptor(apiMetricsInterceptor).addPathPatterns("/api/**");
}
}
3. SLA达标率计算(以95分位为例)
Prometheus本身不存储分位数,需在Micrometer中启用直方图:
// 在Timer构建时添加SLO桶
sample.stop(Timer.builder("baodanbao.api.duration")
.tag("api", apiName)
.tag("biz_source", bizSource)
.serviceLevelObjectives(
java.time.Duration.ofMillis(200),
java.time.Duration.ofMillis(500),
java.time.Duration.ofMillis(1000)
)
.publishPercentiles(0.5, 0.95, 0.99) // 发布分位数(仅内存,不推荐生产)
.register(meterRegistry));
更推荐使用SLO Bucket + Prometheus Histogram方式。上述.serviceLevelObjectives()会生成le(less than or equal)标签的计数器,可用于计算SLA达成率。
例如,查询“美团试吃接口500ms内完成比例”:
sum(rate(baodanbao_api_duration_seconds_bucket{api="trial_apply", biz_source="meituan", le="0.5"}[5m]))
/
sum(rate(baodanbao_api_duration_seconds_count{api="trial_apply", biz_source="meituan"}[5m]))
4. 调用量与错误率监控
- 总调用量:
rate(baodanbao_api_duration_seconds_count[1m]) - 错误率(5xx):
sum(rate(baodanbao_api_duration_seconds_count{status=~"5.."}[1m]))
/
sum(rate(baodanbao_api_duration_seconds_count[1m]))
可在Grafana中配置告警:当错误率>1%或95分位>800ms时触发。
5. 高级:按商户维度细分(可选)
若需监控重点商户(如KA客户)的API体验,可在请求头中传递X-Merchant-Id,并作为Tag注入:
String merchantId = request.getHeader("X-Merchant-Id");
if (merchantId != null && isKeyAccount(merchantId)) {
sample.stop(Timer.builder("baodanbao.api.duration")
.tag("merchant_id", merchantId)
// ... other tags
.register(meterRegistry));
}
6. 验证指标输出
访问 GET /actuator/prometheus,可见类似指标:
baodanbao_api_duration_seconds_count{api="trial_apply",biz_source="meituan",method="POST",status="200"} 1245.0
baodanbao_api_duration_seconds_sum{api="trial_apply",biz_source="meituan",method="POST",status="200"} 423.8
baodanbao_api_duration_seconds_bucket{api="trial_apply",biz_source="meituan",method="POST",status="200",le="0.5"} 1180.0
baodanbao_api_duration_seconds_bucket{api="trial_apply",biz_source="meituan",method="POST",status="200",le="+Inf"} 1245.0
本文著作权归吃喝不愁app开发者团队,转载请注明出处!

被折叠的 条评论
为什么被折叠?



