ai53_19/garbage_datasets与GitLab CI/CD：手动审批部署流程-优快云博客

ai53_19/garbage_datasets与GitLab CI/CD：手动审批部署流程

【免费下载链接】垃圾分类数据集项目地址: https://ai.gitcode.com/ai53_19/garbage_datasets

引言：为什么需要手动审批部署流程

在垃圾分类系统部署过程中，你是否曾因误操作导致生产环境服务中断？是否担心模型更新后识别准确率下降却无法回滚？ai53_19/garbage_datasets项目作为基于YOLOv8的垃圾分类系统核心，其部署质量直接影响40类垃圾（如FastFoodBox、Cigarette、Powerbank等）的识别效果。本文将构建一套包含手动审批环节的GitLab CI/CD流水线，通过环境隔离、自动化测试与人工确认机制，确保模型安全可靠地交付到生产环境。

读完本文你将掌握：

垃圾分类系统的多环境部署架构设计
GitLab CI/CD配置文件编写与手动审批节点设置
模型性能验证自动化脚本开发
基于Kubernetes的蓝绿部署实现

系统架构与部署流程设计

1. 多环境部署架构

ai53_19/garbage_datasets项目采用三级环境部署策略，各环境配置如下表所示：

环境类型	用途	资源配置	自动部署	审批要求
开发环境(Dev)	模型训练与功能测试	单机8GB GPU	是	无需
测试环境(Test)	性能验证与集成测试	集群4×16GB GPU	是	自动化测试通过后
生产环境(Prod)	线上服务部署	集群8×24GB GPU	否	手动审批+测试报告

环境间数据流通过GitLab CI/CD实现，部署架构图如下：

mermaid

2. 手动审批流程设计

手动审批环节设置在测试环境验证通过后、生产部署前，包含以下关键节点：

审批触发条件：测试环境性能测试达标（QPS>100、平均响应时间<300ms、准确率>92%）
审批人配置：项目技术负责人+运维负责人（至少1人批准）
审批时效：24小时内未处理自动拒绝并通知相关人员
审批材料：测试报告、模型性能对比表、变更影响评估

GitLab CI/CD配置实现

1. .gitlab-ci.yml文件结构

基于项目现有Dockerfile和Helm配置，构建包含手动审批的CI/CD流水线：

stages:
  - test
  - train
  - build
  - deploy_dev
  - integration_test
  - deploy_test
  - performance_test
  - manual_approval
  - deploy_prod

variables:
  DOCKER_REGISTRY: registry.example.com
  IMAGE_NAME: garbage-detector
  HELM_CHART: ./helm/garbage-detector

# 单元测试阶段
unit_test:
  stage: test
  script:
    - pip install -r requirements.txt
    - pytest tests/ --cov=garbage_datasets.py

# 模型训练阶段
train_model:
  stage: train
  script:
    - python garbage_datasets.py --train --data data.yaml
  artifacts:
    paths:
      - best.pt
      - runs/
  only:
    - main
    - /^release\/v.*/

# 构建Docker镜像
build_image:
  stage: build
  script:
    - docker build -t $DOCKER_REGISTRY/$IMAGE_NAME:$CI_COMMIT_SHA .
    - docker push $DOCKER_REGISTRY/$IMAGE_NAME:$CI_COMMIT_SHA
  dependencies:
    - train_model

# 部署开发环境
deploy_dev:
  stage: deploy_dev
  script:
    - helm upgrade --install garbage-dev $HELM_CHART 
      --set image.repository=$DOCKER_REGISTRY/$IMAGE_NAME 
      --set image.tag=$CI_COMMIT_SHA 
      --set replicaCount=1
  environment:
    name: development
    url: http://dev.garbage-detection.example.com

# 集成测试
integration_test:
  stage: integration_test
  script:
    - python tests/integration_test.py --endpoint http://dev.garbage-detection.example.com

# 部署测试环境
deploy_test:
  stage: deploy_test
  script:
    - helm upgrade --install garbage-test $HELM_CHART 
      --set image.repository=$DOCKER_REGISTRY/$IMAGE_NAME 
      --set image.tag=$CI_COMMIT_SHA 
      --set replicaCount=2
  environment:
    name: testing
    url: http://test.garbage-detection.example.com

# 性能测试
performance_test:
  stage: performance_test
  script:
    - jmeter -n -t tests/performance.jmx -Jhost=test.garbage-detection.example.com
  artifacts:
    paths:
      - performance_report.html

# 手动审批阶段
manual_approval:
  stage: manual_approval
  script:
    - echo "等待手动审批部署到生产环境"
    - echo "审批链接: $CI_PROJECT_URL/-/jobs/$CI_JOB_ID"
  when: manual
  artifacts:
    paths:
      - performance_report.html
      - model_evaluation.pdf

# 部署生产环境（蓝绿部署）
deploy_prod:
  stage: deploy_prod
  script:
    - ./scripts/blue_green_deploy.sh 
      --chart $HELM_CHART 
      --image $DOCKER_REGISTRY/$IMAGE_NAME:$CI_COMMIT_SHA 
      --namespace prod
  environment:
    name: production
    url: http://garbage-detection.example.com
  when: manual

2. 关键配置说明

2.1 手动审批节点配置

手动审批阶段通过when: manual关键字实现，关键配置如下：

manual_approval:
  stage: manual_approval
  script:
    - echo "审批所需材料已生成，请下载评估"
  artifacts:
    paths:
      - performance_report.html  # 性能测试报告
      - model_evaluation.pdf      # 模型评估报告
  when: manual  # 启用手动触发
  allow_failure: false  # 拒绝时阻止后续流程

2.2 蓝绿部署脚本实现

scripts/blue_green_deploy.sh脚本实现零 downtime 部署，核心逻辑如下：

#!/bin/bash
# 蓝绿部署脚本

# 参数解析
while [[ "$#" -gt 0 ]]; do
    case $1 in
        --chart) CHART="$2"; shift ;;
        --image) IMAGE="$2"; shift ;;
        --namespace) NAMESPACE="$2"; shift ;;
        *) echo "Unknown parameter passed: $1"; exit 1 ;;
    esac
    shift
done

# 当前活动版本判断
ACTIVE_VERSION=$(helm -n $NAMESPACE list -o json | jq -r '.[] | select(.name | startswith("garbage-")).name' | grep -v "garbage-green" | head -n1)
INACTIVE_VERSION="garbage-green"

echo "当前活动版本: $ACTIVE_VERSION"
echo "待部署版本: $INACTIVE_VERSION"

# 部署新版本(非活动版本)
helm upgrade --install $INACTIVE_VERSION $CHART \
  --namespace $NAMESPACE \
  --set image=$IMAGE \
  --set replicaCount=3

# 健康检查
kubectl -n $NAMESPACE rollout status deployment/$INACTIVE_VERSION

# 切换流量
kubectl -n $NAMESPACE apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: garbage-detection
spec:
  hosts:
  - garbage-detection.example.com
  http:
  - route:
    - destination:
        host: $INACTIVE_VERSION
EOF

# 验证新版本
./scripts/verify_prod_deployment.sh --version $INACTIVE_VERSION

# 卸载旧版本
if [ "$ACTIVE_VERSION" != "" ]; then
  helm uninstall $ACTIVE_VERSION --namespace $NAMESPACE
fi

echo "蓝绿部署完成，当前活动版本: $INACTIVE_VERSION"

审批材料自动化生成

1. 性能测试报告生成

基于项目performance_test.md中的测试框架，扩展生成GitLab审批所需的性能报告：

# scripts/generate_performance_report.py
import pandas as pd
import matplotlib.pyplot as plt
from jinja2 import Template

def generate_report(test_results_path, output_path):
    # 读取测试结果
    df = pd.read_csv(test_results_path)
    
    # 生成关键指标
    metrics = {
        "平均响应时间(ms)": df["response_time"].mean(),
        "95%响应时间(ms)": df["response_time"].quantile(0.95),
        "吞吐量(QPS)": df["throughput"].max(),
        "错误率(%)": df["error_rate"].mean() * 100,
        "GPU平均利用率(%)": df["gpu_utilization"].mean()
    }
    
    # 生成图表
    plt.figure(figsize=(10, 6))
    plt.plot(df["concurrency"], df["response_time"], marker='o')
    plt.title("并发用户数-响应时间关系")
    plt.xlabel("并发用户数")
    plt.ylabel("响应时间(ms)")
    plt.grid(True)
    plt.savefig("response_time_chart.png")
    
    # 使用Jinja2模板生成HTML报告
    with open("templates/report_template.html", "r") as f:
        template = Template(f.read())
    
    html = template.render(metrics=metrics, chart_path="response_time_chart.png")
    
    with open(output_path, "w") as f:
        f.write(html)

if __name__ == "__main__":
    generate_report("jmeter_results.csv", "performance_report.html")

2. 模型评估报告

结合garbage_datasets.py中的评估功能，生成模型性能对比报告：

# scripts/evaluate_model.py
from ultralytics import YOLO
import pandas as pd
import yaml

def evaluate_model(model_path, data_config_path, output_report_path):
    # 加载模型和配置
    model = YOLO(model_path)
    with open(data_config_path, "r") as f:
        data_config = yaml.safe_load(f)
    
    # 运行评估
    results = model.val(data=data_config_path)
    
    # 获取历史评估数据
    history_df = pd.read_csv("model_history.csv")
    
    # 生成对比报告
    report = {
        "当前模型版本": model_path.split("/")[-1],
        "mAP@0.5": results.box.map50,
        "mAP@0.5:0.95": results.box.map,
        "与上版本对比(↑+/-)": results.box.map - history_df.iloc[-1]["mAP@0.5:0.95"],
        "推理速度(ms/张)": results.speed["inference"],
        "内存占用(MB)": results.mem["used"]
    }
    
    # 保存评估结果到历史记录
    new_row = pd.DataFrame([{
        "timestamp": pd.Timestamp.now(),
        "model_version": model_path.split("/")[-1],
        "mAP@0.5": results.box.map50,
        "mAP@0.5:0.95": results.box.map,
        "inference_speed(ms)": results.speed["inference"]
    }])
    history_df = pd.concat([history_df, new_row], ignore_index=True)
    history_df.to_csv("model_history.csv", index=False)
    
    # 生成PDF报告
    from fpdf import FPDF
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    
    pdf.cell(200, 10, txt="模型评估报告", ln=True, align='C')
    for key, value in report.items():
        pdf.cell(200, 10, txt=f"{key}: {value:.4f}", ln=True, align='L')
    
    # 添加类别精度对比
    pdf.cell(200, 10, txt="\n类别精度对比:", ln=True, align='L')
    for cls, acc in zip(results.names, results.box.cls):
        pdf.cell(200, 8, txt=f"{cls}: {acc:.2f}", ln=True, align='L')
    
    pdf.output(output_report_path)

if __name__ == "__main__":
    evaluate_model("best.pt", "data.yaml", "model_evaluation.pdf")

审批流程与操作指南

1. 审批流程步骤

触发审批：测试环境性能测试完成后，GitLab CI/CD自动暂停于manual_approval阶段
审批通知：系统通过邮件和企业微信推送审批请求给指定审批人
材料审核：审批人下载performance_report.html和model_evaluation.pdf进行评估
审批操作：
- 批准：点击GitLab界面"Play"按钮继续部署流程
- 拒绝：点击"Cancel"并填写拒绝原因，触发回滚流程
部署执行：批准后自动执行蓝绿部署脚本，完成生产环境更新

2. 审批决策矩阵

审批人可参考以下决策矩阵判断是否批准部署：

评估维度	通过标准	需谨慎评估	拒绝标准
模型准确率	mAP@0.5>0.92	0.88≤mAP@0.5≤0.92	mAP@0.5<0.88
性能指标	QPS>100且响应时间<300ms	QPS 80-100或响应时间300-500ms	QPS<80或响应时间>500ms
类别覆盖	40类全部达标	1-2类不达标但非核心类别	≥3类不达标或核心类别不达标
资源消耗	GPU利用率<75%	GPU利用率75-85%	GPU利用率>85%

3. 紧急部署流程

当需要紧急修复生产环境问题时，可启动紧急部署流程：

开发者在GitLab提交带有[hotfix]前缀的合并请求
CI/CD流水线自动跳过部分测试，直达手动审批环节
需2名及以上审批人同时批准
部署后自动触发额外的冒烟测试和监控告警

部署后验证与回滚机制

1. 自动验证脚本

部署完成后执行以下验证步骤，确保服务正常运行：

#!/bin/bash
# scripts/verify_prod_deployment.sh

set -e

# 参数解析
while [[ "$#" -gt 0 ]]; do
    case $1 in
        --version) VERSION="$2"; shift ;;
        *) echo "Unknown parameter passed: $1"; exit 1 ;;
    esac
    shift
done

# 健康检查
kubectl rollout status deployment/$VERSION -n prod

# 冒烟测试
curl -f http://garbage-detection.example.com/health || {
    echo "健康检查失败"
    exit 1
}

# 功能验证
python - <<END
import requests
import base64

# 测试图片准备
with open("test_samples/glass_bottle.jpg", "rb") as f:
    img_data = base64.b64encode(f.read()).decode()

# 发送预测请求
response = requests.post(
    "http://garbage-detection.example.com/api/classify",
    json={"image_base64": img_data}
)

result = response.json()
assert result["top1"]["class"] == "GlassCup", "分类结果错误"
assert result["top1"]["confidence"] > 0.85, "置信度低于阈值"
print("功能验证通过")
END

echo "$VERSION 部署验证通过"

2. 自动回滚触发条件

当出现以下情况时，系统自动触发回滚机制：

#!/bin/bash
# scripts/monitor_and_rollback.sh

# 监控指标阈值
MAX_ERROR_RATE=0.01
MAX_RESPONSE_TIME=500
MIN_THROUGHPUT=80

# 获取当前活动版本
ACTIVE_VERSION=$(kubectl get deployment -n prod -o jsonpath='{.items[0].metadata.name}')

# 采集指标
ERROR_RATE=$(kubectl exec -n prod deploy/$ACTIVE_VERSION -- curl -s http://localhost:8080/metrics | grep "http_requests_total{status=~\"5..\"}" | awk '{print $2}')
TOTAL_REQUESTS=$(kubectl exec -n prod deploy/$ACTIVE_VERSION -- curl -s http://localhost:8080/metrics | grep "http_requests_total" | awk '{sum+=$2} END {print sum}')
ERROR_RATE=$(echo "scale=4; $ERROR_RATE / $TOTAL_REQUESTS" | bc)

RESPONSE_TIME=$(kubectl exec -n prod deploy/$ACTIVE_VERSION -- curl -s http://localhost:8080/metrics | grep "http_request_duration_seconds_sum" | awk '{print $2}')
THROUGHPUT=$(kubectl exec -n prod deploy/$ACTIVE_VERSION -- curl -s http://localhost:8080/metrics | grep "http_requests_per_second" | awk '{print $2}')

# 判断是否需要回滚
if (( $(echo "$ERROR_RATE > $MAX_ERROR_RATE" | bc -l) )) || \
   (( $(echo "$RESPONSE_TIME > $MAX_RESPONSE_TIME" | bc -l) )) || \
   (( $(echo "$THROUGHPUT < $MIN_THROUGHPUT" | bc -l) )); then
   
   echo "检测到异常指标，触发自动回滚"
   # 切换到上一个稳定版本
   PREVIOUS_VERSION=$(kubectl get deployment -n prod -o jsonpath='{.items[1].metadata.name}')
   
   # 切换流量
   kubectl apply -f - <<EOF
   apiVersion: networking.istio.io/v1alpha3
   kind: VirtualService
   metadata:
     name: garbage-detection
   spec:
     hosts:
     - garbage-detection.example.com
     http:
     - route:
       - destination:
           host: $PREVIOUS_VERSION
   EOF
   
   # 通知管理员
   curl -X POST https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=$WECHAT_WEBHOOK_KEY \
     -H "Content-Type: application/json" \
     -d '{"msgtype":"text","text":{"content":"生产环境异常，已自动回滚到版本 '$PREVIOUS_VERSION'"}}'
   
   exit 1
fi

echo "系统指标正常"

总结与最佳实践

1. 流程优化建议

审批材料自动化：集成模型评估和性能测试脚本，自动生成标准化审批材料
多级审批策略：核心业务线配置多级审批，非核心服务可简化为单级审批
审批时效管理：设置SLA响应时间，超时未处理自动升级通知
审批记录审计：所有审批操作记录到GitLab Audit Log，满足合规要求

2. 常见问题解决方案

问题场景	解决方案	预防措施
审批人不在导致部署延迟	配置审批人代理机制	建立审批人轮班制度
审批材料不完整	自动化脚本检查材料完整性	CI阶段验证报告生成
生产部署后性能下降	蓝绿部署+自动回滚	灰度发布策略试点
审批流程繁琐	基于风险等级动态调整审批步骤	实现审批流程自动化分级

3. 未来扩展方向

AI辅助审批：基于历史审批数据训练审批决策模型，提供审批建议
渐进式部署：集成金丝雀发布能力，支持按比例流量切换
跨云部署审批：扩展支持多云环境部署审批流程
合规自动化：集成GDPR、等保2.0等合规检查到审批流程

通过本文介绍的GitLab CI/CD手动审批部署流程，ai53_19/garbage_datasets项目实现了开发效率与系统稳定性的平衡，确保垃圾分类模型安全可靠地交付到生产环境。如需进一步优化，可关注模型量化技术与边缘计算部署的结合，进一步提升系统性能。

如果你觉得本文对你有帮助，请点赞、收藏并关注项目仓库，下期将带来《垃圾分类模型的A/B测试框架设计》。

【免费下载链接】垃圾分类数据集项目地址: https://ai.gitcode.com/ai53_19/garbage_datasets

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考