如何将 Redis 备份与 Kubernetes 集成?(完整指南 + 生产级实践)
在云原生环境中,Kubernetes(K8s) 是容器编排的事实标准。将 Redis 备份机制与 Kubernetes 深度集成,可以实现:
- ✅ 自动化备份与恢复
- ✅ 声明式配置管理
- ✅ 与 Prometheus、Velero 等生态无缝对接
- ✅ 高可用、弹性伸缩的备份体系
本文将详细介绍 如何在 Kubernetes 环境中实现 Redis 的自动备份、完整性监控、恢复演练和告警通知,涵盖 CronJob、Operator、Sidecar、Velero 四种主流方案。
一、Kubernetes 集成核心目标
| 目标 | 实现方式 |
|---|---|
| ✅ 自动化备份 | CronJob 定时触发 |
| ✅ 备份持久化存储 | PersistentVolume + S3/NFS |
| ✅ 备份完整性验证 | Job 验证 + Prometheus Exporter |
| ✅ 恢复能力保障 | 恢复演练 Job + 测试命名空间 |
| ✅ 监控告警 | Prometheus + Alertmanager |
| ✅ 声明式管理 | Helm / Kustomize |
二、方案一:使用 CronJob 实现定时备份(推荐入门)
2.1 架构图
[CronJob] → [Backup Pod] → [Redis] → [RDB] → [S3/NFS]
2.2 创建备份脚本 ConfigMap
# configmap-backup-script.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-backup-script
data:
backup.sh: |
#!/bin/sh
set -euo pipefail
REDIS_HOST=${REDIS_HOST:-redis}
BACKUP_DIR="/backup"
DATE=$(date +%Y%m%d_%H%M%S)
RDB_FILE="$BACKUP_DIR/dump-$DATE.rdb"
echo "🔄 Triggering BGSAVE on $REDIS_HOST"
redis-cli -h $REDIS_HOST BGSAVE
# 等待完成
while [ "$(redis-cli -h $REDIS_HOST INFO persistence | grep rdb_bgsave_in_progress | cut -d: -f2)" = "1" ]; do
sleep 2
done
# 复制 RDB 文件(假设 Redis 启用了持久化)
cp /data/dump.rdb "$RDB_FILE"
# 压缩并上传到 S3
gzip "$RDB_FILE"
aws s3 cp "$RDB_FILE.gz" "s3://my-backup-bucket/redis/"
# 清理旧文件
find $BACKUP_DIR -name "*.rdb.gz" -mtime +7 -delete
echo "✅ Backup completed: $RDB_FILE.gz"
2.3 创建 CronJob
# cronjob-redis-backup.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: redis-backup
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
jobTemplate:
spec:
template:
spec:
containers:
- name: redis-backup
image: redis:7-alpine
command: ["/bin/sh", "/scripts/backup.sh"]
env:
- name: REDIS_HOST
value: "redis.prod.svc.cluster.local"
volumeMounts:
- name: script
mountPath: /scripts
- name: redis-data
mountPath: /data # 挂载 Redis 数据目录
- name: aws-creds
mountPath: /root/.aws
restartPolicy: OnFailure
volumes:
- name: script
configMap:
name: redis-backup-script
- name: redis-data
persistentVolumeClaim:
claimName: redis-data-pvc
- name: aws-creds
secret:
secretName: aws-credentials
2.4 部署
kubectl apply -f configmap-backup-script.yaml
kubectl apply -f cronjob-redis-backup.yaml
三、方案二:使用 Redis Operator(高级生产方案)
3.1 推荐 Operator
- Redis Enterprise Operator(官方)
- Spotahome Redis Operator(开源)
- Bitnami Redis Cluster Operator
以 Spotahome Redis Operator 为例。
3.2 安装 Operator
kubectl apply -f https://raw.githubusercontent.com/spotahome/redis-operator/master/manifests/redisfailover/crds.yaml
kubectl create -f https://raw.githubusercontent.com/spotahome/redis-operator/master/manifests/operator/all-redisfailover.yaml
3.3 定义带备份的 Redis 集群
# redis-backup-enabled.yaml
apiVersion: storage.spotahome.com/v1alpha2
kind: RedisFailover
metadata:
name: redis-prod
spec:
sentinel:
replicas: 3
resources:
requests:
memory: 100Mi
cpu: 100m
redis:
replicas: 3
resources:
requests:
memory: 1Gi
cpu: 500m
# 启用持久化
storage:
persistentVolumeClaim:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
# 自定义配置
config:
save: "900 1 300 10"
appendonly: "yes"
Operator 本身不直接提供备份,但为备份提供了基础。
3.4 配合 Velero 实现备份
# 安装 Velero
velero install \
--provider aws \
--bucket my-backup-bucket \
--backup-location-config region=minio,s3ForcePathStyle=true,s3Url=http://minio:9000 \
--secret-file ./credentials-velero
# 备份整个 Redis StatefulSet
velero backup create redis-backup --selector app=redis
四、方案三:Sidecar 模式(精细化控制)
4.1 架构
[Redis Pod]
├── [redis-container]
└── [backup-sidecar]
→ 定时备份 RDB → S3
4.2 Pod 配置示例
# pod-with-sidecar.yaml
apiVersion: v1
kind: Pod
metadata:
name: redis-with-backup
spec:
containers:
- name: redis
image: redis:7
volumeMounts:
- name: data
mountPath: /data
ports:
- containerPort: 6379
command: ["redis-server", "--save", "900 1", "--appendonly", "yes"]
- name: backup-sidecar
image: alpine:latest
volumeMounts:
- name: data
mountPath: /data
- name: script
mountPath: /scripts
- name: aws-creds
mountPath: /root/.aws
command: ["/bin/sh", "-c"]
args:
- |
while true; do
if [ $(date +%H) -eq 02 ] && [ $(date +%M) -eq 00 ]; then
echo "Starting backup..."
/scripts/backup.sh && echo "Backup completed"
fi
sleep 60
done
volumes:
- name: data
emptyDir: {}
- name: script
configMap:
name: redis-backup-script
- name: aws-creds
secret:
secretName: aws-credentials
五、方案四:使用 Velero(集群级备份)
5.1 适用场景
- 备份整个 Redis Pod + PVC
- 灾难恢复、跨集群迁移
- 配合 CSI 驱动实现快照
5.2 备份 PVC
velero backup create redis-pvc-backup \
--include-namespaces redis-prod \
--include-resources persistentvolumeclaims
5.3 恢复
velero restore create --from-backup redis-pvc-backup
六、监控与告警集成
6.1 使用 Prometheus 监控备份状态
# prometheus-rules.yaml
groups:
- name: redis-backup.rules
rules:
- alert: RedisBackupMissing
expr: absent(last_over_time(redis_backup_last_success[24h]))
for: 10m
labels:
severity: critical
annotations:
summary: "Redis backup not completed in 24 hours"
description: "No Redis backup has been recorded in the last day."
- alert: RedisBackupCorrupted
expr: redis_backup_integrity == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Redis backup file is corrupted"
6.2 使用 Kubernetes Events 记录
在备份脚本中发送事件:
kubectl create event --type=Normal \
--reason=BackupCompleted \
--object-kind=Pod \
--object-name=$(hostname) \
"Redis backup completed successfully"
七、Helm Chart 统一管理(推荐)
创建 redis-backup Helm Chart
charts/redis-backup/
├── Chart.yaml
├── values.yaml
├── templates/
│ ├── configmap.yaml
│ ├── cronjob.yaml
│ └── serviceaccount.yaml
values.yaml
redis:
host: redis.prod.svc.cluster.local
port: 6379
backup:
schedule: "0 2 * * *"
ttlDays: 7
s3:
bucket: my-backup-bucket
region: us-east-1
image:
repository: redis
tag: 7-alpine
实现一键部署、参数化配置。
八、最佳实践总结
| 项目 | 推荐做法 |
|---|---|
| 备份频率 | CronJob 每日 1~2 次 |
| 存储后端 | S3、MinIO、NFS |
| 加密 | S3 SSE 或客户端加密 |
| 权限 | 使用 Kubernetes ServiceAccount + IAM |
| 恢复演练 | 每月在测试命名空间执行 |
| 监控 | Prometheus + Alertmanager |
| CI/CD | 使用 ArgoCD / Flux 实现 GitOps |
九、常见问题与解决方案
| 问题 | 解决方案 |
|---|---|
| PVC 无法挂载 | 检查 StorageClass 和访问模式 |
| 备份脚本权限不足 | 使用 Init Container 配置权限 |
| S3 访问失败 | 使用 IAM Role for ServiceAccount (IRSA) |
| 恢复后数据不一致 | 确保备份时 Redis 持久化已启用 |
| CronJob 不执行 | 检查时区、kube-controller-manager 状态 |
十、结语
通过将 Redis 备份与 Kubernetes 集成,你可以实现:
🔑 真正的云原生数据保护:
- 自动化:CronJob / Operator
- 可观测:Prometheus + Logging
- 可恢复:Velero + 演练
- 可管理:Helm + GitOps
314

被折叠的 条评论
为什么被折叠?



