Kubernetes集群的搭建与DevOps实践(下)- 部署实践篇

Kubernetes部署与CI/CD实践

本文将详细介绍生产级Kubernetes集群的搭建步骤、CI/CD流水线配置、监控部署和故障排查方法并提供可执行的命令和配置文件。

适合读者:运维工程师、DevOps工程师、想动手搭建K8s集群的技术人员

前置阅读:建议先阅读《架构设计篇》了解整体架构和技术选型


目录

  • 一、环境准备
  • 二、中间件部署
  • 三、Kubernetes集群搭建
  • 四、CI/CD流水线搭建
  • 五、应用部署实践
  • 六、监控与日志
  • 七、故障排查与问题解决
  • 八、总结与检查清单

一、环境准备

1.0 网络规划

网络规划是部署的第一步,合理的网络架构能确保安全隔离和高效通信。

1.0.1 VPC与子网规划

采用三层子网架构,实现网络隔离:

VPC网络: 10.0.0.0/16
├── entry子网:      10.0.10.0/24  (公网入口层)
├── middleware子网: 10.0.20.0/24  (中间件层)
└── k8s子网:        10.0.30.0/24  (应用服务层)
子网CIDR用途部署服务器
entry子网10.0.10.0/24公网入口、运维管理entry-01, jumpserver
middleware子网10.0.20.0/24中间件服务middleware-01
k8s子网10.0.30.0/24K8s集群master-01~03, node-01~02

K8s内部网络规划

网段CIDR用途
Pod网段172.16.0.0/16Pod IP分配(Calico管理)
Service网段10.96.0.0/12Service ClusterIP
1.0.2 安全组配置

Entry子网安全组(公网入口)

方向端口来源/目标用途
入站80, 4430.0.0.0/0Web访问
入站10022运维IP白名单SSH管理(非标准端口)
出站ALL0.0.0.0/0允许所有出站

K8s子网安全组

方向端口来源/目标用途
入站6443entry子网K8s API Server
入站30080, 30443entry子网Ingress NodePort
入站ALLk8s子网内部集群内通信
入站ALL172.16.0.0/16Pod网络通信
出站ALL0.0.0.0/0允许所有出站

Middleware子网安全组

方向端口来源/目标用途
入站3306, 6379, 8848等k8s子网中间件服务端口
入站10022entry子网SSH管理
出站ALL0.0.0.0/0允许所有出站
1.0.3 服务器互访规则
graph LR subgraph entry子网 Entry[Entry节点] Jump[JumpServer] end subgraph middleware子网 MW[Middleware] end subgraph k8s子网 Master[K8s Master] Worker[K8s Worker] end Internet((互联网)) --> |80/443| Entry Entry --> |6443| Master Entry --> |30080| Worker Jump --> |10022| Master Jump --> |10022| MW Worker --> |3306/6379/8848| MW Master --> |3306/6379/8848| MW

1.1 服务器清单

角色主机名IP示例配置说明
Entryentry-0110.0.10.102C/4GNginx + Squid代理
Middlewaremiddleware-0110.0.20.108C/32GMySQL、Redis等
K8s Masterk8s-master-0110.0.30.104C/8G控制平面
K8s Masterk8s-master-0210.0.30.114C/8G控制平面
K8s Masterk8s-master-0310.0.30.124C/8G控制平面
K8s Workerk8s-node-0110.0.30.208C/32G工作节点
K8s Workerk8s-node-0210.0.30.218C/32G工作节点
JumpServerjumpserver10.0.10.204C/8G堡垒机

1.2 基础设施初始化

在部署K8s之前,需要完成基础设施的初始化配置。

1.2.1 服务器基础配置

所有服务器执行

/* by 01130.hk - online tools website : 01130.hk/zh/calcspeed.html */
#!/bin/bash
# 服务器基础配置脚本

# 1. 设置主机名(根据服务器角色修改)
HOSTNAME="k8s-master-01"
hostnamectl set-hostname $HOSTNAME
echo "127.0.0.1 $HOSTNAME" >> /etc/hosts

# 2. 时区配置
timedatectl set-timezone Asia/Shanghai
timedatectl set-ntp yes

# 3. 内核参数优化
cat > /etc/sysctl.d/local.conf << EOF
# 文件描述符
fs.file-max = 512000

# TCP优化
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.core.somaxconn = 4096
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 10000 65000
net.ipv4.tcp_max_syn_backlog = 4096

# 开启BBR拥塞控制
net.ipv4.tcp_congestion_control = bbr

# 禁用IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
EOF
sysctl -p /etc/sysctl.d/local.conf

# 4. 系统资源限制
cat > /etc/security/limits.conf << EOF
*         hard    nofile      512000
*         soft    nofile      512000
root      hard    nofile      512000
root      soft    nofile      512000
EOF

# 5. SSH安全配置
cat > /etc/ssh/sshd_config << EOF
Include /etc/ssh/sshd_config.d/*.conf
Port 60022
PermitRootLogin prohibit-password
PubkeyAuthentication yes
PasswordAuthentication no
ClientAliveInterval 60
ClientAliveCountMax 5
EOF
systemctl restart sshd
1.2.2 入口服务器部署

在Entry节点执行

/* by 01130.hk - online tools website : 01130.hk/zh/calcspeed.html */
#!/bin/bash
# 入口服务器Nginx配置

# 1. 安装Nginx
apt-get update
apt-get install -y nginx

# 2. 配置Nginx(支持stream模块用于TCP负载均衡)
cat > /etc/nginx/nginx.conf << EOF
user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
    worker_connections 20480;
    multi_accept on;
}

# TCP负载均衡(用于K8s API Server)
stream {    
    include /data/nginx/stream-sites-enabled/*;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    client_max_body_size 0;
    
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    
    log_format main '[\$time_local] \$remote_addr -> '
                    '"\$request" \$status \$body_bytes_sent '
                    '"\$http_user_agent" \$request_time';
    
    access_log /data/nginx/logs/access.log main;
    error_log /data/nginx/logs/error.log;
    
    gzip on;
    gzip_types text/plain text/css application/json application/javascript;
    
    include /data/nginx/sites-enabled/*;
}
EOF

# 3. 创建目录结构
mkdir -p /data/nginx/{stream-sites-enabled,logs,sites-enabled,conf.d}
chown -R www-data:www-data /data/nginx

K8s API Server负载均衡配置

# K8s API Server TCP负载均衡(6443端口)
cat > /data/nginx/stream-sites-enabled/k8s-apiserver.conf << EOF
upstream k8s-apiserver {
    server 10.0.30.10:6443 max_fails=3 fail_timeout=30s;
    server 10.0.30.11:6443 max_fails=3 fail_timeout=30s;
    server 10.0.30.12:6443 max_fails=3 fail_timeout=30s;
}

server {
    listen 6443;
    proxy_pass k8s-apiserver;
    proxy_timeout 3s;
    proxy_connect_timeout 1s;
}
EOF

K8s Ingress NodePort负载均衡配置

# Ingress节点HTTP负载均衡
cat > /data/nginx/conf.d/k8s-ingress.conf << EOF
upstream ingress_nodes {
    server 10.0.30.20:30080;
    server 10.0.30.21:30080;
}
EOF

# 应用站点配置示例
cat > /data/nginx/sites-enabled/app.conf << EOF
server {
    listen 80;
    server_name app.example.com;
    
    location / {
        proxy_pass http://ingress_nodes;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
    }
}
EOF

systemctl reload nginx
1.2.3 堡垒机部署

在JumpServer节点执行

#!/bin/bash
# JumpServer一键部署

# 使用官方脚本快速部署
curl -sSL https://resource.fit2cloud.com/jumpserver/jumpserver/releases/download/v3.10.17/quick_start.sh | bash

# 修改配置(可选)
# vim /opt/jumpserver/config/config.txt
# 常用配置项:
# - HTTP_PORT=80
# - HTTPS_PORT=443
# - DOMAINS="jumpserver.example.com"

# 重启服务
cd /opt/jumpserver-installer-v3.10.17
./jmsctl.sh restart

JumpServer访问

  • 默认地址:http://<JumpServer-IP>:80
  • 默认账号:admin
  • 默认密码:admin(首次登录需修改)
1.2.4 Docker引擎安装

在Middleware节点执行(用于运行中间件容器):

#!/bin/bash
# Docker引擎安装与配置

# 1. 安装依赖
apt-get update
apt-get install -y ca-certificates curl gnupg lsb-release

# 2. 添加Docker官方GPG密钥(使用阿里云镜像)
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | \
    gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# 3. 添加Docker仓库
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
    https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" | \
    tee /etc/apt/sources.list.d/docker.list

# 4. 安装Docker
apt-get update
apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# 5. 配置Docker
mkdir -p /data/docker
cat > /etc/docker/daemon.json << EOF
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  },
  # 设置代理(可选)
  "registry-mirrors": [
    "https://docker.m.daocloud.io"
  ],
  # docker数据目录
  "data-root": "/data/docker"
}
EOF

# 6. 启动Docker
systemctl enable docker
systemctl restart docker

# 7. 验证安装
docker info
docker compose version
1.2.5 安全加固

所有服务器执行

#!/bin/bash
# 服务器安全加固

# 1. 安装fail2ban防暴力破解
apt-get install -y fail2ban

# 2. 配置fail2ban
cat > /etc/fail2ban/jail.local << EOF
[DEFAULT]
ignoreip = 127.0.0.1/8 ::1
bantime = 3600
maxretry = 3
findtime = 600
banaction = iptables-multiport

[sshd]
enabled = true
port = 10022
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
EOF

# 3. 启动fail2ban
systemctl enable fail2ban
systemctl restart fail2ban

# 4. 查看状态
fail2ban-client status sshd

1.3 操作系统优化

所有节点统一使用 Ubuntu Server 22.04

1.2.1 关闭Swap

K8s要求关闭Swap,否则kubelet无法正常工作:

# 立即关闭
swapoff -a

# 永久关闭:删除fstab中的swap行
sed -i '/swap/d' /etc/fstab
1.2.2 加载内核模块
cat > /etc/modules-load.d/k8s.conf << EOF
overlay        # OverlayFS文件系统
br_netfilter   # 网桥过滤
EOF

modprobe overlay
modprobe br_netfilter

# 验证
lsmod | grep -E "overlay|br_netfilter"
1.2.3 配置内核参数
cat > /etc/sysctl.d/k8s.conf << EOF
# K8s必需参数
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1

# 连接跟踪优化
net.netfilter.nf_conntrack_max = 524288

# TCP优化
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.core.somaxconn = 32768

# 文件描述符
fs.file-max = 2097152
EOF

sysctl --system
1.2.4 配置系统资源限制
cat >> /etc/security/limits.conf << EOF
# Kubernetes resource limits
* soft nofile 655360
* hard nofile 655360
* soft nproc 655360
* hard nproc 655360
EOF

1.4 安装containerd

1.4.1 安装
# 添加Docker镜像源(containerd包含在其中)
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | \
  gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" | \
  tee /etc/apt/sources.list.d/docker.list

apt-get update
apt-get install -y containerd.io
1.4.2 配置containerd
mkdir -p /etc/containerd

cat > /etc/containerd/config.toml << 'EOF'
version = 2

[plugins."io.containerd.grpc.v1.cri"]
  # 使用国内镜像
  sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
  
  [plugins."io.containerd.grpc.v1.cri".containerd]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
        SystemdCgroup = true  # 使用systemd作为cgroup驱动
  
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
      endpoint = ["https://docker.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
      endpoint = ["https://k8s.m.daocloud.io"]
EOF

systemctl daemon-reload
systemctl restart containerd
systemctl enable containerd

1.5 安装Kubernetes组件

在所有K8s节点上执行:

# 添加阿里云Kubernetes源
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | \
  gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
  https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main' | \
  tee /etc/apt/sources.list.d/kubernetes.list

# 安装指定版本
apt-get update
apt-get install -y kubelet=1.28.6-1.1 kubeadm=1.28.6-1.1 kubectl=1.28.6-1.1

# 锁定版本,防止意外升级
apt-mark hold kubelet kubeadm kubectl

# 启用kubelet
systemctl enable kubelet

1.6 配置时间同步

apt-get install -y chrony

cat > /etc/chrony/chrony.conf << 'EOF'
server ntp.aliyun.com iburst
server ntp.tencent.com iburst
driftfile /var/lib/chrony/chrony.drift
makestep 1.0 3
rtcsync
EOF

systemctl restart chrony
systemctl enable chrony

1.7 配置hosts文件

所有节点添加:

cat >> /etc/hosts << EOF
10.0.30.10  k8s-master-01
10.0.30.11  k8s-master-02
10.0.30.12  k8s-master-03
10.0.30.20  k8s-node-01
10.0.30.21  k8s-node-02
10.0.10.10  k8s-api-lb
EOF

二、中间件部署

2.1 存储规划

Middleware节点建议挂载独立数据盘:

# 热数据盘(SSD):MySQL、Redis
mkdir -p /data/hot
mount /dev/vdb1 /data/hot

# 冷数据盘(HDD):Elasticsearch、MinIO
mkdir -p /data/cold
mount /dev/vdc1 /data/cold

# 写入fstab自动挂载
echo '/dev/vdb1 /data/hot ext4 defaults 0 0' >> /etc/fstab
echo '/dev/vdc1 /data/cold ext4 defaults 0 0' >> /etc/fstab

2.2 Docker Compose配置

# docker-compose.yml
version: '3'
services: 
  mysql:
    image: mysql:8.0
    restart: always
    ports:
      - 3306:3306
    volumes:
      - /data/hot/mysql:/var/lib/mysql
      - ./config/my.cnf:/etc/mysql/conf.d/my.cnf
    environment:
      MYSQL_ROOT_PASSWORD: ${MYSQL_PASSWORD}
      TZ: Asia/Shanghai
    networks: 
      - middleware

  redis:
    image: redis:7.2
    restart: always
    ports:
      - 6379:6379
    volumes: 
      - /data/hot/redis:/data
    command: redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes
    networks: 
      - middleware

  nacos:
    image: nacos/nacos-server:v2.3.2
    restart: always
    depends_on:
      - mysql
    environment:
      MODE: standalone
      NACOS_AUTH_ENABLE: "true"
      SPRING_DATASOURCE_PLATFORM: mysql
      MYSQL_SERVICE_HOST: mysql
      MYSQL_SERVICE_DB_NAME: nacos
      MYSQL_SERVICE_USER: root
      MYSQL_SERVICE_PASSWORD: ${MYSQL_PASSWORD}
    ports:
      - 8848:8848
      - 9848:9848
    networks: 
      - middleware

  rabbitmq:
    image: rabbitmq:3.12-management
    restart: always
    ports:
      - 5672:5672
      - 15672:15672
    environment:
      RABBITMQ_DEFAULT_USER: admin
      RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASSWORD}
    volumes:
      - /data/hot/rabbitmq:/var/lib/rabbitmq
    networks: 
      - middleware

  elasticsearch:
    image: elasticsearch:7.17.19
    restart: always
    volumes:
      - /data/cold/elasticsearch:/usr/share/elasticsearch/data
    environment:
      discovery.type: single-node
      ES_JAVA_OPTS: -Xms2g -Xmx2g
    ports: 
      - 9200:9200
    networks: 
      - middleware

networks:
  middleware:
    driver: bridge

2.3 MySQL优化配置

# config/my.cnf
[mysqld]
# 连接数
max_connections = 1000

# 缓冲池大小(建议为物理内存的50-70%)
innodb_buffer_pool_size = 16G

# 日志配置
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2

# 字符集
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

# 时区
default-time-zone = '+08:00'

2.4 启动中间件

# 创建环境变量文件
cat > .env << EOF
MYSQL_PASSWORD=YourStrongPassword123
REDIS_PASSWORD=YourStrongPassword456
RABBITMQ_PASSWORD=YourStrongPassword789
EOF

# 启动
docker compose up -d

# 检查状态
docker compose ps

三、Kubernetes集群搭建

3.1 初始化第一个Master节点

3.1.1 生成kubeadm配置文件
# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.6
controlPlaneEndpoint: "10.0.10.10:6443"  # Entry节点的负载均衡地址

networking:
  podSubnet: "172.16.0.0/16"      # Pod网段
  serviceSubnet: "10.96.0.0/12"   # Service网段

imageRepository: registry.aliyuncs.com/google_containers

apiServer:
  certSANs:
    - "10.0.10.10"
    - "10.0.30.10"
    - "10.0.30.11"
    - "10.0.30.12"
    - "k8s-api-lb"
    - "127.0.0.1"

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
3.1.2 执行初始化
# 拉取镜像
kubeadm config images pull --config=kubeadm-config.yaml

# 初始化集群
kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.log

# 配置kubectl
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 验证
kubectl cluster-info
kubectl get nodes

重要:保存输出的join命令,包含token和certificate-key。

3.2 加入其他Master节点

在k8s-master-02和k8s-master-03上执行:

kubeadm join 10.0.10.10:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <certificate-key>

# 配置kubectl
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config

3.3 加入Worker节点

在k8s-node-01和k8s-node-02上执行:

kubeadm join 10.0.10.10:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

3.4 安装Calico网络插件

# 下载Tigera Operator
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml

# 等待Operator就绪
kubectl wait --namespace tigera-operator \
  --for=condition=ready pod \
  --selector=name=tigera-operator \
  --timeout=90s

# 创建自定义资源配置
cat > calico-custom-resources.yaml << EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
    - blockSize: 26
      cidr: 172.16.0.0/16
      encapsulation: VXLANCrossSubnet  # 同子网BGP,跨子网VXLAN
      natOutgoing: Enabled
      nodeSelector: all()
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}
EOF

kubectl create -f calico-custom-resources.yaml

# 等待所有节点Ready
kubectl get nodes -w

3.5 部署Traefik Ingress

3.5.1 安装Helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
3.5.2 部署Traefik
# 添加仓库
helm repo add traefik https://traefik.github.io/charts
helm repo update

# 创建values配置
cat > traefik-values.yaml << 'EOF'
deployment:
  kind: DaemonSet

image:
  tag: "v3.2"

ingressClass:
  enabled: true
  isDefaultClass: true

ports:
  web:
    port: 8000
    exposedPort: 80
    nodePort: 30080
  websecure:
    port: 8443
    exposedPort: 443
    nodePort: 30443

service:
  type: NodePort

logs:
  general:
    level: INFO
  access:
    enabled: true

ingressRoute:
  dashboard:
    enabled: true
    matchRule: Host(`traefik.example.com`)
    entryPoints: ["web"]

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"
EOF

# 安装
helm install traefik traefik/traefik \
  --namespace traefik \
  --create-namespace \
  --values traefik-values.yaml

# 验证
kubectl get pods -n traefik
kubectl get svc -n traefik

3.6 验证集群

# 检查节点状态
kubectl get nodes -o wide

# 检查系统Pod
kubectl get pods -A

# 创建测试应用
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get svc nginx

# 测试访问
curl http://<node-ip>:<node-port>

# 清理测试
kubectl delete deployment nginx
kubectl delete svc nginx

四、CI/CD流水线搭建

4.1 CI/CD整体架构

graph LR A[开发者Push代码] --> B[GitLab触发CI] B --> C[构建阶段: 编译] C --> D[打包阶段: Docker镜像] D --> E[推送到Harbor] E --> F[部署到测试环境] F -->|手动触发| G[部署到生产环境] G --> H[K8s滚动更新]

4.2 Harbor私有镜像仓库

4.2.1 安装Harbor
# 下载离线安装包
wget https://github.com/goharbor/harbor/releases/download/v2.12.0/harbor-offline-installer-v2.12.0.tgz
tar xvf harbor-offline-installer-v2.12.0.tgz
cd harbor

# 修改配置
cp harbor.yml.tmpl harbor.yml
# harbor.yml 关键配置
hostname: harbor.example.com

https:
  port: 443
  certificate: /data/cert/server.crt
  private_key: /data/cert/server.key

harbor_admin_password: Harbor12345

data_volume: /data/cold/harbor
# 安装
./install.sh

# 配置开机自启
cat > /etc/systemd/system/harbor.service << EOF
[Unit]
Description=Harbor
After=docker.service
Requires=docker.service

[Service]
Type=simple
Restart=on-failure
WorkingDirectory=/root/harbor
ExecStart=/usr/bin/docker compose up
ExecStop=/usr/bin/docker compose down

[Install]
WantedBy=multi-user.target
EOF

systemctl enable harbor
4.2.2 配置K8s拉取凭证
kubectl create secret docker-registry harbor-secret \
  --docker-server=harbor.example.com \
  --docker-username=admin \
  --docker-password=Harbor12345 \
  --namespace=default

4.3 GitLab CI模板化设计

4.3.1 模板项目结构
devops/ci-templates/
├── build/
│   ├── java.build.gitlab-ci.yml
│   └── node.build.gitlab-ci.yml
├── deploy/
│   ├── java.deploy.gitlab-ci.yml
│   └── node.deploy.gitlab-ci.yml
└── rules/
    └── changes.gitlab-ci.yml
4.3.2 Java构建模板
# build/java.build.gitlab-ci.yml
variables:
  REGISTRY_ADDRESS: harbor.example.com
  REGISTRY_SECRET: harbor-secret

stages:
  - build
  - package
  - deploy_qa
  - deploy_prod

.build:
  stage: build
  image: maven:3.9-eclipse-temurin-17
  script:
    - mvn clean package -DskipTests
  artifacts:
    paths:
      - "**/target/*.jar"
    expire_in: 1 hrs
  tags:
    - java-build

.package:
  stage: package
  image: docker:20.10-dind
  services:
    - docker:20.10-dind
  before_script:
    - docker login -u $HARBOR_USER -p $HARBOR_PASSWORD $REGISTRY_ADDRESS
  script:
    - docker build -t ${REGISTRY_ADDRESS}/${CI_PROJECT_PATH}/${MODULE_NAME}:${CI_COMMIT_SHA:0:8} .
    - docker push ${REGISTRY_ADDRESS}/${CI_PROJECT_PATH}/${MODULE_NAME}:${CI_COMMIT_SHA:0:8}
  tags:
    - docker

.deploy:
  stage: deploy_prod
  image: bitnami/kubectl:1.28
  when: manual
  script:
    - |
      kubectl set image deployment/${MODULE_NAME} \
        ${MODULE_NAME}=${REGISTRY_ADDRESS}/${CI_PROJECT_PATH}/${MODULE_NAME}:${CI_COMMIT_SHA:0:8} \
        -n ${NAMESPACE}
    - kubectl rollout status deployment/${MODULE_NAME} -n ${NAMESPACE} --timeout=300s
4.3.3 微服务项目CI配置
# .gitlab-ci.yml
include:
  - project: 'devops/ci-templates'
    file: '/build/java.build.gitlab-ci.yml'
  - project: 'devops/ci-templates'
    file: '/deploy/java.deploy.gitlab-ci.yml'

variables:
  MODULE_NAME: order-service
  MODULE_PORT: 8080
  NAMESPACE: production

build:
  extends: .build

package:
  extends: .package
  needs:
    - build

deploy_prod:
  extends: .deploy
  variables:
    REPLICAS: "2"
  needs:
    - package

4.4 智能变更检测

只有修改了某个服务的代码,才触发该服务的构建:

# rules/changes.gitlab-ci.yml
.order_service_changes:
  rules:
    - changes:
        - service/order-service/**/*
        - pom.xml
        - .gitlab-ci.yml

.user_service_changes:
  rules:
    - changes:
        - service/user-service/**/*
        - pom.xml
        - .gitlab-ci.yml

在微服务CI配置中使用:

build_order:
  extends:
    - .build
    - .order_service_changes
  variables:
    MODULE_NAME: order-service

五、应用部署实践

5.1 Java应用Deployment配置

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      imagePullSecrets:
        - name: harbor-secret
      containers:
      - name: order-service
        image: harbor.example.com/project/order-service:latest
        ports:
          - containerPort: 8080
        env:
          - name: SPRING_PROFILES_ACTIVE
            value: "prod"
          - name: JAVA_OPTS
            value: >-
              -XX:+UseContainerSupport
              -XX:MaxRAMPercentage=70.0
              -XX:+UseG1GC
        resources:
          requests:
            cpu: "256m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "2048Mi"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
          failureThreshold: 3
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  namespace: production
spec:
  selector:
    app: order-service
  ports:
    - port: 8080
      targetPort: 8080

5.2 Ingress路由配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web,websecure
spec:
  ingressClassName: traefik
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /order
            pathType: Prefix
            backend:
              service:
                name: order-service
                port:
                  number: 8080
          - path: /user
            pathType: Prefix
            backend:
              service:
                name: user-service
                port:
                  number: 8080

5.3 ConfigMap和Secret使用

# ConfigMap - 非敏感配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  NACOS_SERVER: "10.0.20.10:8848"
  REDIS_HOST: "10.0.20.10"
  LOG_LEVEL: "INFO"

---
# Secret - 敏感配置
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
  namespace: production
type: Opaque
stringData:
  MYSQL_PASSWORD: "YourPassword123"
  REDIS_PASSWORD: "YourPassword456"

在Deployment中引用:

containers:
- name: app
  envFrom:
    - configMapRef:
        name: app-config
    - secretRef:
        name: app-secret

六、监控与日志

6.1 Prometheus + Grafana部署

6.1.1 部署Node Exporter
# node-exporter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v1.7.0
        args:
          - --path.procfs=/host/proc
          - --path.sysfs=/host/sys
          - --path.rootfs=/host/root
        ports:
          - containerPort: 9100
            hostPort: 9100
        volumeMounts:
          - name: proc
            mountPath: /host/proc
            readOnly: true
          - name: sys
            mountPath: /host/sys
            readOnly: true
          - name: root
            mountPath: /host/root
            readOnly: true
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: root
          hostPath:
            path: /
6.1.2 Prometheus配置
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  # K8s节点监控
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__

  # Spring Boot应用监控
  - job_name: 'spring-boot-apps'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: ['production']
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

6.2 告警规则示例

groups:
  - name: node_alerts
    rules:
      # CPU使用率告警
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CPU使用率过高"
          description: "{{ $labels.instance }} CPU使用率已达 {{ $value }}%"

      # 内存使用率告警
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "内存使用率过高"
          description: "{{ $labels.instance }} 内存使用率已达 {{ $value }}%"

      # 磁盘使用率告警
      - alert: HighDiskUsage
        expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) * 100 > 80
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "磁盘使用率过高"

  - name: application_alerts
    rules:
      # 应用健康检查失败
      - alert: ApplicationDown
        expr: up{job="spring-boot-apps"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "应用服务不可用"

6.3 日志收集方案

使用Filebeat收集容器日志到Elasticsearch:

# filebeat-k8s.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    spec:
      containers:
      - name: filebeat
        image: elastic/filebeat:7.17.19
        args:
          - "-c"
          - "/etc/filebeat/filebeat.yml"
          - "-e"
        volumeMounts:
          - name: config
            mountPath: /etc/filebeat
          - name: varlog
            mountPath: /var/log
          - name: containers
            mountPath: /var/lib/docker/containers
            readOnly: true
      volumes:
        - name: config
          configMap:
            name: filebeat-config
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

七、故障排查与问题解决

7.1 常见问题速查表

问题现象可能原因排查命令解决方案
Pod一直Pending资源不足kubectl describe pod <name>增加节点或调整资源请求
Pod CrashLoopBackOff应用启动失败kubectl logs <pod>检查应用配置和依赖
ImagePullBackOff镜像拉取失败kubectl describe pod <name>检查镜像地址和凭证
Service无法访问Endpoints为空kubectl get endpoints <svc>检查Pod标签和selector
Ingress 502后端Pod未就绪kubectl get pods检查readinessProbe
OOMKilled内存不足kubectl describe pod <name>增加内存限制
节点NotReady网络或kubelet问题kubectl describe node <name>检查kubelet和网络插件
DNS解析失败CoreDNS问题kubectl logs -n kube-system -l k8s-app=kube-dns重启CoreDNS

7.2 排查命令速查

# 查看Pod详细信息
kubectl describe pod <pod-name> -n <namespace>

# 查看Pod日志
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace> --previous  # 上一个容器

# 进入Pod调试
kubectl exec -it <pod-name> -n <namespace> -- sh

# 查看Service和Endpoints
kubectl get svc,endpoints -n <namespace>

# 查看所有异常Pod
kubectl get pods -A --field-selector=status.phase!=Running

# 查看节点资源使用
kubectl top nodes
kubectl top pods -A

# 查看事件
kubectl get events -A --sort-by='.lastTimestamp'

7.3 典型案例分析

案例1:Pod频繁OOMKilled

现象:Pod每隔几小时重启,状态显示OOMKilled

排查

kubectl describe pod <pod-name> -n production
# 查看 Limits 和实际使用

kubectl top pod <pod-name> -n production
# 查看当前内存使用

原因:JVM堆内存设置与K8s限制不匹配

解决

env:
- name: JAVA_OPTS
  value: "-XX:MaxRAMPercentage=70.0"  # 使用百分比而非固定值
resources:
  limits:
    memory: "2048Mi"  # 预留30%给堆外内存
案例2:跨节点Pod通信504

现象:同节点Pod通信正常,跨节点返回504超时,失败率约50%

快速排查

# 1. 验证跨节点Pod通信
kubectl exec pod-on-node1 -- ping -c 3 <pod-on-node2-ip>

# 2. 对比:节点直接访问Pod(如果成功说明是安全组问题)
ssh node1 "curl http://<pod-on-node2-ip>"

# 3. 检查安全组是否包含Pod网络
# 需要放行:Pod网络CIDR(如172.16.0.0/16)

根本原因:云平台安全组只允许了节点网络,未允许Pod网络CIDR

解决:在安全组添加规则:

  • 入站:ANY - Pod网络CIDR(如172.16.0.0/16)
  • 出站:ANY - Pod网络CIDR(如172.16.0.0/16)

详细案例分析:参见《故障排查实战》篇


八、总结与检查清单

8.1 部署前检查清单

检查项命令/操作预期结果
系统时间同步timedatectlSystem clock synchronized: yes
Swap已关闭free -hSwap行全为0
内核模块已加载lsmod | grep br_netfilter有输出
containerd运行正常systemctl status containerdactive (running)
kubelet已启用systemctl is-enabled kubeletenabled
网络连通性节点间ping测试全部通
镜像源可访问crictl pull nginx成功拉取

8.2 部署后验证清单

检查项命令预期结果
所有节点Readykubectl get nodesSTATUS全为Ready
系统Pod正常kubectl get pods -n kube-system全为Running
网络插件正常kubectl get pods -n calico-system全为Running
DNS解析正常kubectl run test --rm -it --image=busybox -- nslookup kubernetes解析成功
跨节点通信创建两个Pod,互相ping通信正常
Ingress工作创建测试Ingress并访问正常响应

8.3 常用命令速查

# 集群管理
kubectl cluster-info
kubectl get nodes -o wide
kubectl get pods -A

# 应用管理
kubectl apply -f deployment.yaml
kubectl rollout status deployment/<name>
kubectl rollout undo deployment/<name>
kubectl scale deployment/<name> --replicas=5

# 日志和调试
kubectl logs -f <pod>
kubectl exec -it <pod> -- sh
kubectl describe pod <pod>
kubectl top nodes
kubectl top pods

# 清理
kubectl delete pod <name> --force --grace-period=0
kubectl delete namespace <name>

8.4 关键配置文件位置

配置项路径
kubeadm配置/etc/kubernetes/admin.conf
kubelet配置/var/lib/kubelet/config.yaml
containerd配置/etc/containerd/config.toml
Calico配置kubectl get installation default -o yaml
kubectl配置~/.kube/config

关键词: Kubernetes、部署实践、CI/CD、监控、故障排查

本 PPT 介绍了制药厂房中供配电系统的总体概念设计要点,内容包括: 洁净厂房的特点及其对供配电系统的特殊要求; 供配电设计的一般原则依据的国家/行业标准; 从上级电网到工厂变电所、终端配电的总体结构模块化设计思路; 供配电范围:动力配电、照明、通讯、接地、防雷消防等; 动力配电中电压等级、接地系统形式(如 TN-S)、负荷等级可靠性、UPS 配置等; 照明的电源方式、光源选择、安装方式、应急备用照明要求; 通讯系统、监控系统在生产管理消防中的作用; 接地等电位连接、防雷等级防雷措施; 消防设施及其专用供电(消防泵、排烟风机、消防控制室、应急照明等); 常见高压柜、动力柜、照明箱等配电设备案例及部分设计图纸示意; 公司已完成的典型项目案例。 1. 工程背景总体框架 所属领域:制药厂房工程的公用工程系统,其中本 PPT 聚焦于供配电系统。 放在整个公用工程中的位置:给排水、纯化水/注射用水、气体热力、暖通空调、自动化控制等系统并列。 2. Part 01 供配电概述 2.1 洁净厂房的特点 空间密闭,结构复杂、走向曲折; 单相设备、仪器种类多,工艺设备昂贵、精密; 装修材料工艺材料种类多,对尘埃、静电等更敏感。 这些特点决定了:供配电系统要安全可靠、减少积尘、便于清洁和维护。 2.2 供配电总则 供配电设计应满足: 可靠、经济、适用; 保障人身财产安全; 便于安装维护; 采用技术先进的设备方案。 2.3 设计依据规范 引用了大量俄语标准(ГОСТ、СНиП、SanPiN 等)以及国家、行业和地方规范,作为设计的法规基础文件,包括: 电气设备、接线、接地、电气安全; 建筑物电气装置、照明标准; 卫生安全相关规范等。 3. Part 02 供配电总览 从电源系统整体结构进行总览: 上级:地方电网; 工厂变电所(10kV 配电装置、变压
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值