DevCloudFE/MateChat:监控告警集成实战指南
痛点:AI应用监控告警缺失的困境
在AI应用开发过程中,你是否遇到过这样的场景:
- 用户反馈AI助手突然"失声",但开发团队毫不知情
- 大模型API调用异常,导致用户体验中断却无法及时发现
- 关键业务指标异常波动,缺乏实时告警机制
- 性能瓶颈无法提前预警,等到用户投诉才被动响应
MateChat作为前端智能化场景解决方案UI库,提供了完整的监控告警集成方案,帮助开发者构建稳定可靠的AI应用。
监控告警架构设计
核心监控指标体系
| 监控类别 | 具体指标 | 告警阈值 | 监控频率 |
|---|---|---|---|
| 性能监控 | API响应时间(P95) | > 3秒 | 实时 |
| 性能监控 | 首屏加载时间 | > 2秒 | 页面加载 |
| 性能监控 | 组件渲染FPS | < 30fps | 实时 |
| 错误监控 | JS运行时错误率 | > 0.1% | 实时 |
| 错误监控 | 网络请求失败率 | > 1% | 实时 |
| 错误监控 | 大模型API错误 | 任何错误 | 实时 |
| 业务监控 | 会话成功率 | < 95% | 每分钟 |
| 业务监控 | 用户活跃会话数 | 异常波动 | 每小时 |
| 业务监控 | 消息处理吞吐量 | 异常下降 | 每分钟 |
集成监控告警实战
1. 错误监控集成
// src/utils/monitoring.ts
import { McBubble, McInput, McLayout } from '@matechat/core';
// 全局错误监控
class MonitoringService {
private static instance: MonitoringService;
private errorCount = 0;
private performanceMetrics: Map<string, number> = new Map();
static getInstance(): MonitoringService {
if (!MonitoringService.instance) {
MonitoringService.instance = new MonitoringService();
}
return MonitoringService.instance;
}
// 监控MateChat组件错误
monitorComponentErrors() {
const originalBubbleErrorHandler = McBubble.props.onError;
McBubble.props.onError = (error: Error) => {
this.trackError('McBubble', error);
originalBubbleErrorHandler?.(error);
};
const originalInputErrorHandler = McInput.props.onError;
McInput.props.onError = (error: Error) => {
this.trackError('McInput', error);
originalInputErrorHandler?.(error);
};
// 监听全局错误
window.addEventListener('error', (event) => {
this.trackError('Global', event.error);
});
// 监听Promise rejection
window.addEventListener('unhandledrejection', (event) => {
this.trackError('Promise', event.reason);
});
}
trackError(component: string, error: Error) {
const errorData = {
component,
message: error.message,
stack: error.stack,
timestamp: Date.now(),
userAgent: navigator.userAgent
};
// 发送到监控平台
this.sendToMonitoringPlatform('error', errorData);
// 错误率超过阈值触发告警
this.errorCount++;
if (this.errorCount > 10) {
this.triggerAlert('ERROR_RATE_HIGH', `错误率异常: ${this.errorCount}`);
}
}
// 性能监控
trackPerformance(metricName: string, value: number) {
this.performanceMetrics.set(metricName, value);
// API响应时间监控
if (metricName === 'api_response_time' && value > 3000) {
this.triggerAlert('API_SLOW', `API响应时间过长: ${value}ms`);
}
// 渲染性能监控
if (metricName === 'render_fps' && value < 30) {
this.triggerAlert('LOW_FPS', `渲染帧率过低: ${value}fps`);
}
}
// 告警触发
triggerAlert(type: string, message: string) {
const alertData = {
type,
message,
timestamp: Date.now(),
metrics: Object.fromEntries(this.performanceMetrics)
};
// 发送告警到多种渠道
this.sendAlertToChannels(alertData);
}
private sendToMonitoringPlatform(type: string, data: any) {
// 集成监控平台API
fetch('/api/monitoring', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ type, data })
}).catch(console.error);
}
private sendAlertToChannels(alertData: any) {
// 邮件告警
this.sendEmailAlert(alertData);
// 即时消息告警
this.sendIMAlert(alertData);
// Webhook回调
this.sendWebhookAlert(alertData);
}
}
export const monitoring = MonitoringService.getInstance();
2. 大模型API监控
// src/services/model-monitor.ts
import { monitoring } from '../utils/monitoring';
export class ModelMonitor {
private apiCalls: number = 0;
private failures: number = 0;
private responseTimes: number[] = [];
wrapModelAPI(apiFunction: Function) {
return async (...args: any[]) => {
const startTime = Date.now();
this.apiCalls++;
try {
const result = await apiFunction(...args);
const duration = Date.now() - startTime;
this.responseTimes.push(duration);
monitoring.trackPerformance('model_api_time', duration);
// 监控响应时间分布
if (duration > 5000) {
monitoring.triggerAlert('MODEL_SLOW', `大模型响应缓慢: ${duration}ms`);
}
return result;
} catch (error) {
this.failures++;
const errorRate = (this.failures / this.apiCalls) * 100;
monitoring.trackError('ModelAPI', error as Error);
// 错误率告警
if (errorRate > 5) {
monitoring.triggerAlert('MODEL_ERROR_HIGH',
`大模型API错误率过高: ${errorRate.toFixed(2)}%`);
}
throw error;
}
};
}
getMetrics() {
const avgResponseTime = this.responseTimes.length > 0
? this.responseTimes.reduce((a, b) => a + b, 0) / this.responseTimes.length
: 0;
const p95ResponseTime = this.calculatePercentile(95);
return {
totalCalls: this.apiCalls,
failures: this.failures,
errorRate: (this.failures / this.apiCalls) * 100,
avgResponseTime,
p95ResponseTime
};
}
private calculatePercentile(percentile: number): number {
if (this.responseTimes.length === 0) return 0;
const sorted = [...this.responseTimes].sort((a, b) => a - b);
const index = Math.ceil(sorted.length * (percentile / 100)) - 1;
return sorted[Math.max(0, index)];
}
}
3. 业务指标监控
// src/services/business-monitor.ts
export class BusinessMonitor {
private sessions: Map<string, SessionMetrics> = new Map();
private messagesProcessed: number = 0;
trackSessionStart(sessionId: string) {
this.sessions.set(sessionId, {
startTime: Date.now(),
messageCount: 0,
successful: true
});
}
trackMessageProcessed(sessionId: string, success: boolean) {
this.messagesProcessed++;
const session = this.sessions.get(sessionId);
if (session) {
session.messageCount++;
if (!success) session.successful = false;
}
// 监控消息处理吞吐量
if (this.messagesProcessed % 100 === 0) {
this.checkThroughput();
}
}
trackSessionEnd(sessionId: string) {
const session = this.sessions.get(sessionId);
if (session) {
const duration = Date.now() - session.startTime;
const successRate = session.successful ? 100 : 0;
monitoring.trackPerformance('session_duration', duration);
if (!session.successful) {
monitoring.triggerAlert('SESSION_FAILED', `会话失败: ${sessionId}`);
}
}
}
private checkThroughput() {
const now = Date.now();
const recentMessages = Array.from(this.sessions.values())
.filter(session => now - session.startTime < 60000)
.reduce((sum, session) => sum + session.messageCount, 0);
// 吞吐量异常检测
if (recentMessages < 10) {
monitoring.triggerAlert('LOW_THROUGHPUT',
`消息处理吞吐量异常: ${recentMessages} msg/min`);
}
}
getBusinessMetrics() {
const totalSessions = this.sessions.size;
const successfulSessions = Array.from(this.sessions.values())
.filter(session => session.successful).length;
const successRate = totalSessions > 0
? (successfulSessions / totalSessions) * 100
: 100;
return {
totalSessions,
successfulSessions,
successRate: Math.round(successRate),
totalMessages: this.messagesProcessed
};
}
}
interface SessionMetrics {
startTime: number;
messageCount: number;
successful: boolean;
}
告警渠道集成配置
// src/config/alert-config.ts
export interface AlertConfig {
enabled: boolean;
channels: AlertChannel[];
thresholds: AlertThresholds;
recipients: string[];
}
export interface AlertChannel {
type: 'email' | 'sms' | 'webhook' | 'im';
config: any;
}
export interface AlertThresholds {
errorRate: number; // 错误率阈值(%)
apiResponseTime: number; // API响应时间阈值(ms)
lowFps: number; // 低帧率阈值(fps)
lowThroughput: number; // 低吞吐量阈值(msg/min)
}
export const defaultAlertConfig: AlertConfig = {
enabled: true,
channels: [
{
type: 'email',
config: {
smtp: {
host: 'smtp.example.com',
port: 587,
secure: false,
auth: {
user: 'alert@example.com',
pass: 'password'
}
},
from: 'alert@example.com',
subject: '[MateChat告警] {alert_type}'
}
},
{
type: 'webhook',
config: {
url: 'https://api.monitoring.com/alerts',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json'
}
}
}
],
thresholds: {
errorRate: 1,
apiResponseTime: 3000,
lowFps: 30,
lowThroughput: 10
},
recipients: ['dev-team@example.com', 'oncall@example.com']
};
监控仪表板实现
<!-- src/components/MonitoringDashboard.vue -->
<template>
<McLayout class="monitoring-dashboard">
<McHeader title="MateChat监控仪表板" />
<McLayoutContent>
<div class="metrics-grid">
<!-- 实时指标卡片 -->
<MetricCard
title="API响应时间"
:value="metrics.apiResponseTime"
unit="ms"
:threshold="3000"
trend="lower"
/>
<MetricCard
title="错误率"
:value="metrics.errorRate"
unit="%"
:threshold="1"
trend="lower"
/>
<MetricCard
title="会话成功率"
:value="metrics.sessionSuccessRate"
unit="%"
:threshold="95"
trend="higher"
/>
<MetricCard
title="消息吞吐量"
:value="metrics.throughput"
unit="msg/min"
:threshold="50"
trend="higher"
/>
</div>
<!-- 告警列表 -->
<div class="alerts-section">
<h3>最近告警</h3>
<div v-for="alert in recentAlerts" :key="alert.id" class="alert-item">
<span :class="['alert-level', alert.level]">{{ alert.level }}</span>
<span class="alert-message">{{ alert.message }}</span>
<span class="alert-time">{{ formatTime(alert.timestamp) }}</span>
</div>
</div>
<!-- 性能图表 -->
<div class="charts-section">
<PerformanceChart :data="performanceData" />
<ErrorRateChart :data="errorRateData" />
</div>
</McLayoutContent>
</McLayout>
</template>
<script setup lang="ts">
import { ref, onMounted } from 'vue';
import { McLayout, McHeader, McLayoutContent } from '@matechat/core';
import MetricCard from './MetricCard.vue';
import PerformanceChart from './PerformanceChart.vue';
import ErrorRateChart from './ErrorRateChart.vue';
const metrics = ref({
apiResponseTime: 0,
errorRate: 0,
sessionSuccessRate: 0,
throughput: 0
});
const recentAlerts = ref([]);
const performanceData = ref([]);
const errorRateData = ref([]);
onMounted(async () => {
await loadMonitoringData();
setInterval(loadMonitoringData, 5000); // 5秒刷新
});
async function loadMonitoringData() {
// 从监控平台获取数据
const response = await fetch('/api/monitoring/metrics');
const data = await response.json();
metrics.value = data.metrics;
recentAlerts.value = data.alerts.slice(0, 10);
performanceData.value = data.performance;
errorRateData.value = data.errorRates;
}
function formatTime(timestamp: number) {
return new Date(timestamp).toLocaleTimeString();
}
</script>
<style scoped>
.monitoring-dashboard {
padding: 20px;
}
.metrics-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
gap: 16px;
margin-bottom: 24px;
}
.alerts-section {
margin-bottom: 24px;
}
.alert-item {
display: flex;
align-items: center;
padding: 8px 12px;
margin-bottom: 8px;
border-left: 4px solid #ccc;
background: #f8f9fa;
}
.alert-level {
padding: 2px 8px;
border-radius: 4px;
font-size: 12px;
font-weight: bold;
margin-right: 12px;
}
.alert-level.critical {
background: #dc3545;
color: white;
}
.alert-level.warning {
background: #ffc107;
color: #212529;
}
.alert-message {
flex: 1;
}
.alert-time {
color: #6c757d;
font-size: 12px;
}
.charts-section {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 24px;
}
</style>
部署与运维最佳实践
1. 监控数据存储方案
2. 高可用架构设计
// 监控系统高可用配置
export const highAvailabilityConfig = {
// 多区域部署
regions: ['cn-east-1', 'cn-north-1', 'cn-south-1'],
// 故障转移策略
failover: {
enabled: true,
timeout: 5000,
retryAttempts: 3
},
// 数据备份
backup: {
enabled: true,
interval: '1h',
retention: '30d'
},
// 负载均衡
loadBalancing: {
strategy: 'round-robin',
healthCheck: {
interval: '30s',
timeout: '5s'
}
}
};
总结与展望
通过MateChat的监控告警集成方案,您可以:
✅ 实时掌握应用健康状态 - 全方位监控性能、错误、业务指标 ✅ 快速发现并解决问题 - 智能告警机制确保问题及时响应
✅ 提升用户体验 - 通过性能优化和错误预防增强用户满意度 ✅ 降低运维成本 - 自动化监控减少人工干预需求
未来规划:
- 集成更多监控平台(如阿里云ARMS、腾讯云APM)
- 支持自定义监控指标和告警规则
- 提供AI驱动的异常检测和根因分析
- 扩展微前端和跨框架监控支持
立即集成MateChat监控告警,为您的AI应用构建坚如磐石的稳定性保障!
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



