Velero安装部署全攻略:从零开始搭建生产环境
概述
Velero(前身为Heptio Ark)是一个强大的Kubernetes集群备份和恢复工具,能够备份和恢复Kubernetes集群资源及持久卷。无论您使用公有云平台还是本地环境,Velero都能提供可靠的灾难恢复和迁移解决方案。
通过本文,您将获得:
- Velero核心架构的深度理解
- 生产环境部署的最佳实践
- 多云环境配置指南
- 性能优化和安全配置
- 完整的故障排除方案
Velero核心架构
核心组件说明
| 组件 | 功能描述 | 部署方式 |
|---|---|---|
| Velero Server | 主控制器,管理备份恢复操作 | Deployment |
| Node Agent | 节点级数据移动代理 | DaemonSet |
| BackupStorageLocation | 备份存储位置配置 | Custom Resource |
| VolumeSnapshotLocation | 卷快照位置配置 | Custom Resource |
环境准备与要求
系统要求
# 检查Kubernetes版本
kubectl version --short
# 验证集群状态
kubectl cluster-info
kubectl get nodes
版本兼容性矩阵
| Velero版本 | 支持的Kubernetes版本 | 测试验证版本 |
|---|---|---|
| 1.17.x | 1.18+ | 1.31.7, 1.32.3, 1.33.1 |
| 1.16.x | 1.18+ | 1.31.4, 1.32.3, 1.33.0 |
| 1.15.x | 1.18+ | 1.28.8, 1.29.8, 1.30.4, 1.31.1 |
安装Velero CLI客户端
macOS (Homebrew)
brew install velero
Linux (二进制安装)
# 下载最新版本
VERSION=$(curl -s https://api.github.com/repos/vmware-tanzu/velero/releases/latest | grep tag_name | cut -d '"' -f 4)
wget https://github.com/vmware-tanzu/velero/releases/download/${VERSION}/velero-${VERSION}-linux-amd64.tar.gz
# 解压并安装
tar -xvf velero-${VERSION}-linux-amd64.tar.gz
sudo mv velero-${VERSION}-linux-amd64/velero /usr/local/bin/
Windows (Chocolatey)
choco install velero
生产环境部署指南
AWS环境部署示例
# 创建IAM策略文件
cat > velero-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::your-bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name"
]
}
]
}
EOF
# 安装Velero到集群
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket your-backup-bucket \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2 \
--secret-file ./credentials-velero \
--use-node-agent \
--default-volumes-to-fs-backup
Azure环境部署示例
# 设置环境变量
AZURE_BACKUP_RESOURCE_GROUP=VeleroBackups
AZURE_STORAGE_ACCOUNT_ID=velerobackups123
BLOB_CONTAINER=velero
# 安装Velero
velero install \
--provider azure \
--plugins velero/velero-plugin-for-microsoft-azure:v1.7.0 \
--bucket $BLOB_CONTAINER \
--secret-file ./credentials-velero \
--backup-location-config resourceGroup=$AZURE_BACKUP_RESOURCE_GROUP,storageAccount=$AZURE_STORAGE_ACCOUNT_ID \
--snapshot-location-config apiTimeout=5m
GCP环境部署示例
velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.7.0 \
--bucket your-backup-bucket \
--secret-file ./gcp-service-account.json \
--backup-location-config bucket=your-backup-bucket \
--snapshot-location-config project=your-gcp-project
自定义安装配置
资源限制配置
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket your-bucket \
--secret-file ./credentials \
--velero-pod-cpu-request 500m \
--velero-pod-mem-request 512Mi \
--velero-pod-cpu-limit 1000m \
--velero-pod-mem-limit 1Gi \
--node-agent-pod-cpu-request 250m \
--node-agent-pod-mem-request 256Mi \
--node-agent-pod-cpu-limit 500m \
--node-agent-pod-mem-limit 512Mi
优先级类别配置
# priority-class.yaml
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: velero-high-priority
value: 1000000
globalDefault: false
description: "High priority class for Velero components"
kubectl apply -f priority-class.yaml
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket your-bucket \
--secret-file ./credentials \
--server-priority-class-name velero-high-priority \
--node-agent-priority-class-name velero-high-priority
验证安装
# 检查Velero部署状态
kubectl get deployments -n velero
kubectl get pods -n velero
# 验证Velero配置
velero backup-location get
velero snapshot-location get
# 测试备份功能
velero backup create test-backup --include-namespaces default
# 检查备份状态
velero backup describe test-backup
velero backup logs test-backup
备份存储位置配置
AWS S3存储配置
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: aws-primary
namespace: velero
spec:
provider: aws
objectStorage:
bucket: my-velero-backups
prefix: prod-cluster
config:
region: us-west-2
s3ForcePathStyle: "false"
s3Url: https://s3.us-west-2.amazonaws.com
多存储位置配置
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: aws-secondary
namespace: velero
spec:
provider: aws
objectStorage:
bucket: my-velero-backups-dr
prefix: prod-cluster-dr
config:
region: us-east-1
accessMode: ReadOnly
卷快照位置配置
AWS卷快照配置
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: aws-us-west-2
namespace: velero
spec:
provider: aws
config:
region: us-west-2
profile: velero-profile
多区域快照配置
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: aws-us-east-1
namespace: velero
spec:
provider: aws
config:
region: us-east-1
备份策略配置
定时备份配置
# 创建每日备份计划
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 720h
# 创建每周全量备份
velero schedule create weekly-full-backup \
--schedule="0 3 * * 0" \
--include-namespaces '*' \
--ttl 2160h
资源过滤配置
apiVersion: velero.io/v1
kind: Backup
metadata:
name: selective-backup
namespace: velero
spec:
includedNamespaces:
- production
excludedNamespaces:
- kube-system
includedResources:
- pods
- services
- deployments
excludedResources:
- events
- endpoints
ttl: 720h
测试应用程序部署
apiVersion: v1
kind: Namespace
metadata:
name: nginx-example
labels:
app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: nginx-example
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.25
name: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: my-nginx
namespace: nginx-example
spec:
ports:
- port: 80
targetPort: 80
selector:
app: nginx
type: LoadBalancer
备份与恢复操作
创建备份
# 备份特定命名空间
velero backup create nginx-backup --include-namespaces nginx-example
# 备份整个集群(除系统命名空间)
velero backup create full-cluster-backup --exclude-namespaces kube-system,kube-public
# 带标签选择的备份
velero backup create app-backup --selector app=nginx
恢复操作
# 查看可用备份
velero backup get
# 恢复特定备份
velero restore create --from-backup nginx-backup
# 恢复到不同命名空间
velero restore create --from-backup nginx-backup --namespace-mappings nginx-example:nginx-restored
# 查看恢复状态
velero restore describe <RESTORE_NAME>
监控与日志
Prometheus监控配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: velero-monitor
namespace: velero
labels:
app: velero
spec:
selector:
matchLabels:
app: velero
endpoints:
- port: monitoring
interval: 30s
path: /metrics
关键监控指标
| 指标名称 | 描述 | 告警阈值 |
|---|---|---|
| velero_backup_attempt_total | 备份尝试次数 | >5 failures/hour |
| velero_restore_attempt_total | 恢复尝试次数 | >3 failures/hour |
| velero_volume_snapshot_attempt_total | 卷快照尝试 | >10 failures/hour |
| velero_backup_duration_seconds | 备份持续时间 | >30 minutes |
故障排除指南
常见问题排查
# 检查Velero pod日志
kubectl logs -f deployment/velero -n velero
# 检查节点代理日志
kubectl logs -f daemonset/node-agent -n velero
# 验证存储凭据
velero plugin get
velero backup-location get
# 检查CRD状态
kubectl get crd | grep velero
# 验证网络连接
kubectl exec -it deployment/velero -n velero -- curl https://s3.amazonaws.com
备份失败处理流程
性能优化建议
资源调优配置
# values.yaml (Helm安装)
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
nodeAgent:
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
configurations:
backupStorageLocation:
- name: default
provider: aws
bucket: my-velero-backups
config:
region: us-west-2
s3Url: https://s3.us-west-2.amazonaws.com
并发控制配置
# 增加并发处理数
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.7.0 \
--bucket your-bucket \
--secret-file ./credentials \
--node-agent-concurrency 10 \
--restic-parallelism 5
安全最佳实践
RBAC配置
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: velero-server
rules:
- apiGroups: [""]
resources: ["namespaces", "pods", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["velero.io"]
resources: ["*"]
verbs: ["*"]
网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: velero-egress
namespace: velero
spec:
podSelector:
matchLabels:
app: velero
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 80
升级与维护
版本升级流程
# 备份当前配置
velero backup-location get -o yaml > backup-locations.yaml
velero snapshot-location get -o yaml > snapshot-locations.yaml
# 下载新版本CLI
wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.0/velero-v1.8.0-linux-amd64.tar.gz
# 升级服务器组件
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket your-bucket \
--secret-file ./credentials \
--image velero/velero:v1.8.0
定期维护任务
# 清理过期备份
velero backup delete --older-than 30d
# 验证存储完整性
velero backup-location get
velero snapshot-location get
# 检查资源使用情况
kubectl top pods -n velero
总结
通过本文的详细指南,您应该已经掌握了Velero在生产环境中的完整部署流程。记住以下关键点:
- 规划先行:根据业务需求设计备份策略和存储架构
- 安全第一:严格遵循最小权限原则配置访问控制
- 监控到位:建立完善的监控告警体系
- 定期测试:定期验证备份的可用性和恢复流程
- 文档完善:维护详细的运行文档和应急预案
Velero作为Kubernetes生态中成熟的备份解决方案,能够为您的生产环境提供可靠的灾难恢复保障。建议定期关注官方更新,及时应用安全补丁和性能改进。
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



