Formbricks云监控:AWS CloudWatch配置教程
为什么需要CloudWatch监控?
你是否曾遭遇过生产环境突然崩溃却找不到根源?或者用户反馈系统响应缓慢但缺乏数据支撑?在现代SaaS应用中,无监控不生产已成为行业共识。Formbricks作为开源调查工具箱,其云原生架构部署在AWS上时,需要全方位的监控方案确保服务稳定性。
AWS CloudWatch提供了完整的监控解决方案,通过本文你将获得:
- 一键部署的CloudWatch监控架构
- 15+关键业务指标告警配置
- Slack实时告警通知流程
- 与Grafana无缝集成的可视化方案
- 符合CIS基准的安全监控实践
架构概览:Formbricks监控数据流
前置条件与环境准备
环境要求
| 组件 | 版本要求 | 用途 |
|---|---|---|
| Terraform | ≥1.3.0 | 基础设施即代码工具 |
| AWS CLI | ≥2.0 | AWS资源管理 |
| kubectl | ≥1.24 | Kubernetes集群管理 |
| AWS账号 | 管理员权限 | 创建CloudWatch资源 |
权限配置
# 配置AWS凭证
aws configure
# 验证权限
aws cloudwatch describe-alarms --max-items 1
资源克隆
git clone https://gitcode.com/GitHub_Trending/fo/formbricks
cd formbricks/infra/terraform
核心配置:CloudWatch资源部署
1. 基础设施代码结构
terraform/
├── cloudwatch.tf # CloudWatch核心配置
├── observability.tf # 可观测性相关资源
├── main.tf # 主配置文件
└── variables.tf # 变量定义
2. 日志管理配置
创建CloudWatch日志组(cloudwatch.tf):
resource "aws_cloudwatch_log_group" "cloudwatch_cis_benchmark" {
name = "/aws/cis-benchmark-group"
retention_in_days = 365 # 日志保留365天
tags = {
Project = "formbricks"
Environment = "prod"
ManagedBy = "Terraform"
}
}
关键日志流配置:
- EKS集群日志:
/aws/eks/formbricks-prod-eks/cluster - 应用日志:
/aws/ecs/formbricks-app - 数据库日志:
/aws/rds/instance/formbricks-prod/postgresql
3. 告警通知系统
Slack通知集成(cloudwatch.tf):
module "notify-slack" {
source = "terraform-aws-modules/notify-slack/aws"
version = "6.6.0"
slack_channel = "formbricks-alerts" # Slack目标频道
slack_username = "formbricks-cloudwatch"
slack_webhook_url = data.aws_ssm_parameter.slack_notification_channel.value
sns_topic_name = "cloudwatch-alarms" # SNS主题名称
create_sns_topic = true
}
配置Slack Webhook:
- 在Slack工作区创建Incoming Webhook
- 将Webhook URL存储在AWS SSM参数中:
aws ssm put-parameter \
--name "/prod/formbricks/slack-webhook-url" \
--type "SecureString" \
--value "https://hooks.slack.com/services/XXXXX/XXXXX/XXXX"
4. 关键业务指标告警
ALB负载均衡器监控:
locals {
alb_id = "app/k8s-formbricks-21ab9ecd60/342ed65d128ce4cb"
alarms = {
ALB_HTTPCode_Target_5XX_Count = {
alarm_description = "API 5XX错误率过高"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 5 # 连续5个周期
threshold = 5 # 阈值:5个错误
period = 600 # 每10分钟评估
unit = "Count"
namespace = "AWS/ApplicationELB"
metric_name = "HTTPCode_Target_5XX_Count"
statistic = "Sum"
dimensions = {
LoadBalancer = local.alb_id
}
}
}
}
RDS数据库监控:
RDS_CPUUtilization = {
alarm_description = "RDS CPU利用率超过阈值"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 5
threshold = 80 # 80% CPU利用率
period = 60 # 每分钟评估
unit = "Percent"
namespace = "AWS/RDS"
metric_name = "CPUUtilization"
statistic = "Average"
dimensions = {
DBInstanceIdentifier = module.rds-aurora["prod"].cluster_instances["one"].id
}
}
完整告警列表:
| 告警名称 | 监控对象 | 阈值 | 周期 | 描述 |
|---|---|---|---|---|
| ALB_HTTPCode_Target_5XX_Count | 应用负载均衡器 | 5个错误 | 10分钟 | API目标组5XX错误过多 |
| ALB_TargetResponseTime | 应用负载均衡器 | 5秒 | 1分钟 | 目标组响应时间过长 |
| RDS_CPUUtilization | RDS数据库 | 80% | 1分钟 | 数据库CPU利用率过高 |
| RDS_FreeStorageSpace | RDS数据库 | 5GB | 1分钟 | 数据库存储空间不足 |
| RDS_FreeableMemory | RDS数据库 | 100MB | 1分钟 | 数据库可用内存不足 |
| DynamoDB_ConsumedReadCapacityUnits | DynamoDB | 90% | 1分钟 | 读取容量单位使用率过高 |
5. CIS基准合规监控
部署CIS基准告警:
module "cloudwatch_cis-alarms" {
source = "terraform-aws-modules/cloudwatch/aws//modules/cis-alarms"
version = "5.7.1"
log_group_name = aws_cloudwatch_log_group.cloudwatch_cis_benchmark.name
alarm_actions = [module.notify-slack.slack_topic_arn]
# 启用关键安全告警
enable_cis_1_2_13 = true # 不使用默认VPC
enable_cis_1_2_14 = true # 不使用默认子网
enable_cis_1_3_1 = true # 启用VPC流日志
}
Grafana可视化集成
1. IAM权限配置
Grafana访问CloudWatch权限(observability.tf):
module "observability_grafana_iam_policy" {
source = "terraform-aws-modules/iam/aws//modules/iam-policy"
version = "5.53.0"
name_prefix = "grafana-"
description = "Grafana访问CloudWatch权限"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowReadingMetricsFromCloudWatch"
Effect = "Allow"
Action = [
"cloudwatch:DescribeAlarmsForMetric",
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricData"
]
Resource = "*"
},
{
Sid = "AllowReadingLogsFromCloudWatch"
Effect = "Allow"
Action = [
"logs:DescribeLogGroups",
"logs:GetQueryResults",
"logs:GetLogEvents"
]
Resource = "*"
}
]
})
}
2. 配置Grafana数据源
添加CloudWatch数据源:
- 登录Grafana控制台
- 导航至Configuration > Data Sources
- 点击Add data source,选择CloudWatch
- 配置AWS认证:
- 认证方式:AWS SDK Default
- 地区:us-west-2(根据实际环境调整)
- 点击Save & Test验证连接
3. 导入Formbricks监控面板
# 下载Formbricks专用仪表盘
wget https://gitcode.com/GitHub_Trending/fo/formbricks/raw/main/infra/terraform/grafana-dashboard.json
# 通过Grafana API导入
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <grafana-api-key>" \
-d @grafana-dashboard.json \
"http://<grafana-url>/api/dashboards/db"
核心监控面板:
- 系统概览:CPU、内存、磁盘使用率
- 应用性能:请求延迟、错误率、吞吐量
- 数据库性能:查询延迟、连接数、缓存命中率
- 用户体验:页面加载时间、调查提交成功率
部署与验证
1. Terraform部署
# 初始化Terraform
terraform init
# 预览资源变更
terraform plan -var-file=prod.tfvars
# 应用配置
terraform apply -var-file=prod.tfvars -auto-approve
2. 验证监控配置
检查CloudWatch资源:
# 验证日志组
aws cloudwatch describe-log-groups --log-group-name-prefix /aws/cis-benchmark-group
# 验证告警
aws cloudwatch describe-alarms --alarm-name-prefix ALB_
触发测试告警:
# 使用AWS CLI触发测试告警
aws cloudwatch set-alarm-state \
--alarm-name ALB_HTTPCode_Target_5XX_Count \
--state-value ALARM \
--state-reason "Test alarm trigger"
检查Slack频道是否收到测试告警通知,确认通知流程正常。
高级配置:自定义监控指标
1. 自定义应用指标
使用CloudWatch Agent收集自定义指标:
# 安装CloudWatch Agent
sudo yum install amazon-cloudwatch-agent -y
# 配置Agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
自定义指标配置示例(/etc/cloudwatch-agent-config.json):
{
"metrics": {
"metrics_collected": {
"statsd": {
"service_address": ":8125",
"metrics_collection_interval": 10,
"metrics_aggregation_interval": 60
}
}
}
}
2. 自定义告警规则
添加业务指标告警:
locals {
alarms = {
# 现有告警...
# 新增业务告警
Survey_Submission_Error_Rate = {
alarm_description = "调查提交错误率过高"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 3
threshold = 5
period = 60
unit = "Percent"
namespace = "Formbricks/Business"
metric_name = "SurveySubmissionErrorRate"
statistic = "Average"
dimensions = {
Project = "formbricks"
}
}
}
}
维护与最佳实践
1. 成本优化策略
| 资源 | 优化措施 | 预期效果 |
|---|---|---|
| 日志保留期 | 非关键日志设为30天 | 降低存储成本40% |
| 指标粒度 | 非实时指标设为5分钟粒度 | 降低指标存储成本60% |
| 告警阈值 | 基于历史数据调整阈值 | 减少90%误报 |
配置日志保留期:
resource "aws_cloudwatch_log_group" "application_logs" {
name = "/aws/formbricks/application"
retention_in_days = 30 # 非关键日志保留30天
}
2. 监控维护清单
每日检查:
- 告警状态(AWS Console或Grafana)
- 关键指标趋势(响应时间、错误率)
每周维护:
- 审查告警触发历史
- 优化阈值和周期
- 清理不再需要的日志组
每月优化:
- 审查指标收集范围
- 评估存储成本
- 更新CIS基准规则
3. 故障排查流程
总结与后续步骤
通过本文配置,你已成功部署了Formbricks的AWS CloudWatch监控系统,包括:
- 全面的基础设施和应用监控
- 基于Slack的实时告警通知
- 符合CIS基准的安全监控
- 与Grafana集成的可视化面板
后续建议:
- 实现监控数据的长期归档(S3 + Glacier)
- 配置自动化运维响应(AWS Systems Manager Automation)
- 开发自定义业务仪表盘
- 集成成本监控与预算告警
立即行动:
- 克隆仓库开始部署:
git clone https://gitcode.com/GitHub_Trending/fo/formbricks - 查看完整文档:
cd formbricks/docs/self-hosting/setup/monitoring.mdx - 提交改进建议:在项目仓库创建Issue
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



