Tiny Cluster(1)——搭建树莓派小型计算集群_树莓派算力集群-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_38342510/article/details/146715294

1 硬件说明

1.1 软硬件环境


k8s-master-0	192.168.5.78	Debian 12	树莓派5	8G / 4核 / 64G TF卡 & 512G SSD	控制节点
k8s-worker-0	192.168.5.48	Debian 12	树莓派4B	4G / 4核 / 64G TF卡	工作节点
k8s-worker-1	192.168.5.16	Debian 11	BTT-CB1	1G / 4核 / 64G TF卡	工作节点

实现 K8s 集群 (基于 containerd V1.62 和 K8s V1.27)
一个 master，两个 worker

2 搭建

2.1 硬件连接

三块开发板连接在同一交换机上，硬件连接完成后如图：

在这里插入图片描述

2.2 准备工作

2.2.1 树莓派5（k8s-master-0）

换源

这里使用的是清华源，将/etc/apt/sources.list内容替换成：

deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bookworm-security main contrib non-free non-free-firmware

添加加载的内核模块

tee /etc/modules-load.d/containerd.conf<<EOF
overlay
br_netfilter
EOF

加载内核模块

sudo modprobe overlay && sudo modprobe br_netfilter

设置并应用内核参数

tee /etc/sysctl.d/kubernetes.conf<<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
 
sudo sysctl --system

2.2.2 树莓派4B（k8s-worker-0）

换源

这里使用的是清华源，将/etc/apt/sources.list内容替换成：

deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bookworm-security main contrib non-free non-free-firmware

2.2.3 BTT-BC1（k8s-worker-1）

换源

这里使用的是中科大源，将/etc/apt/sources.list内容替换成：

deb https://mirrors.ustc.edu.cn/debian/ bullseye main contrib non-free
deb-src https://mirrors.ustc.edu.cn/debian/ bullseye main contrib non-free

deb https://mirrors.ustc.edu.cn/debian/ bullseye-updates main contrib non-free
deb-src https://mirrors.ustc.edu.cn/debian/ bullseye-updates main contrib non-free

deb https://mirrors.ustc.edu.cn/debian/ bullseye-backports main contrib non-free
deb-src https://mirrors.ustc.edu.cn/debian/ bullseye-backports main contrib non-free

deb https://mirrors.ustc.edu.cn/debian-security/ bullseye-security main contrib non-free
deb-src https://mirrors.ustc.edu.cn/debian-security/ bullseye-security main contrib non-free

2.3 所有节点都需要设置

2.3.1 修改`/etc/hosts`文件

192.168.5.78 k8s-master-0
192.168.5.48 k8s-worker-0
192.168.5.16 k8s-worker-1
20.205.243.166 raw.githubusercontent.com # 以便kubectl apply 时能找到

2.3.2 加`k8s`源

curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat > /etc/apt/sources.list.d/kubernetes.list <<EOF
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

2.3.3 检查更新及安装更新

apt update && apt upgrade -y

2.3.4 安装所需附件

apt install -y curl gnupg2 software-properties-common apt-transport-https ca-certificates

2.3.5 containerd 安装与设置

启用 docker 存储库

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmour -o /etc/apt/trusted.gpg.d/docker.gpg
 
## ubuntu
# 支持x86架构64位cpu
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

# 支持arm64架构cpu
add-apt-repository "deb [arch=arm64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
 
## debian
# 支持x86架构64位cpu
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"

# 支持arm64架构cpu
add-apt-repository "deb [arch=arm64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
 
apt update && apt install -y containerd.io

生成containerd的配置文件

containerd config default | tee /etc/containerd/config.toml >/dev/null 2>&1

修改cgroup Driver为systemd

sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml

编辑 /etc/containerd/config.toml，修改镜像路径

 #sandbox_image = "registry.k8s.io/pause:3.6"
 =>
 sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

systemctl daemon-reload
systemctl start containerd
systemctl enable containerd.service

CTR容器代理设置，避免镜像发生拉取问题
编辑/lib/systemd/system/containerd.service

[Service]
Environment="HTTP_PROXY=http://192.168.0.108:1081"
Environment="HTTPS_PROXY=http://192.168.0.108:1081"
Environment="NO_PROXY=aliyun.com,aliyuncs.com,huaweicloud.com,k8s-master-0,k8s-master-1,k8s-worker-0,localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"

systemctl daemon-reload && systemctl restart containerd

pi5@k8s-master-0:~$ kubectl get ingress,services,pods -A
NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP                  57m
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   57m
 
NAMESPACE      NAME                                       READY   STATUS    RESTARTS       AGE
kube-flannel   pod/kube-flannel-ds-vxf4q                  1/1     Running   1 (20m ago)    56m
kube-flannel   pod/kube-flannel-ds-wp995                  1/1     Running   1 (22m ago)    55m
kube-flannel   pod/kube-flannel-ds-zq2j7                  1/1     Running   0              39m
kube-system    pod/coredns-7bdc4cb885-8rw4l               1/1     Running   1 (20m ago)    57m
kube-system    pod/coredns-7bdc4cb885-brx7j               1/1     Running   1 (20m ago)    57m
kube-system    pod/etcd-k8s-master-0                      1/1     Running   15 (20m ago)   57m
kube-system    pod/etcd-k8s-master-1                      1/1     Running   2 (22m ago)    55m
kube-system    pod/kube-apiserver-k8s-master-0            1/1     Running   22 (20m ago)   57m
kube-system    pod/kube-apiserver-k8s-master-1            1/1     Running   3 (22m ago)    55m
kube-system    pod/kube-controller-manager-k8s-master-0   1/1     Running   22 (20m ago)   57m
kube-system    pod/kube-controller-manager-k8s-master-1   1/1     Running   2 (22m ago)    55m
kube-system    pod/kube-proxy-9hmj5                       1/1     Running   1 (20m ago)    57m
kube-system    pod/kube-proxy-l2wk2                       1/1     Running   0              39m
kube-system    pod/kube-proxy-sf9xv                       1/1     Running   1 (22m ago)    55m
kube-system    pod/kube-scheduler-k8s-master-0            1/1     Running   19 (20m ago)   57m
kube-system    pod/kube-scheduler-k8s-master-1            1/1     Running   2 (22m ago)    55m

3 使用 K3S 搭建集群

主节点（树莓派5）安装 K3S

Lightweight Kubernetes (K3S) 是一个面向IoT及边缘计算的Kubernetes版本，比较适合树莓派等资源有限的硬件。

使用二进制文件进行配置：

# 安装 k3s
curl -sfL https://get.k3s.io | sh -

可能会遇到如下的安装报错：

[INFO]  Failed to find memory cgroup, you may need to add "cgroup_memory=1 cgroup_enable=memory" to your linux cmdline (/boot/cmdline.txt on a Raspberry Pi)

此时可以按照报错提示，在 /boot/cmdline.txt文件末尾添加如下内容：

cgroup_memory=1 cgroup_enable=memory

重启之后再重新安装即可。

安装完成之后，如果只有一台服务器，作为单节点模式，就搭建完了。

kubectl get nodes       # 这个命令不可以用 sudo 执行，否则会出现这里的报错：https://discuss.kubernetes.io/t/couldnt-get-current-server-api-group-list-get-http-localhost-8080-api-timeout-32s-dial-tcp-127-0-0-1-connect-connection-refused/25471/7

NAME   STATUS   ROLES                  AGE   VERSION
pi5    Ready    control-plane,master   67m   v1.31.6+k3s1

但作为集群，还需要搭建worker节点，所以需要查看node-token给worker节点使用，运行结果如下：

sudo cat /var/lib/rancher/k3s/server/node-token
K1064a9edf4298d67e83d35be3bfb7962b445753681618b2a57db49c8b678267ead::server:ccd45d0d4f8b85abecff220a0596b99e

worker 节点

根据在master得到的token在worker节点服务器运行。

curl -sfL https://rancher-mirror.rancher.cn/k3s/k3s-install.sh | K3S_URL=https://<master-node-ip>:6443 K3S_TOKEN=<token> sh -

例如我这里测试的主节点 ip 为：192.168.5.78，那么在其余 worker 节点上运行：

sudo curl -sfL https://get.k3s.io | K3S_URL=https://192.168.5.78:6443 K3S_TOKEN=K1064a9edf4298d67e83d35be3bfb7962b445753681618b2a57db49c8b678267ead sh -

注意：服务器主机名不能相同，如果服务器名相同，需要加参数 INSTALL_K3S_EXEC="–with-node-id"标记主机唯一标识

安装完成之后，可以用以下命令查看集群是否全部启动：


kubectl get nodes
NAME              STATUS   ROLES                  AGE   VERSION
bigtreetech-cb1   Ready    <none>                 54s   v1.31.6+k3s1
pi4b              Ready    <none>                 72s   v1.31.6+k3s1
pi5               Ready    control-plane,master   71m   v1.31.6+k3s1

设置两个 worker 节点角色为 worker：

kubectl label node pi4b node-role.kubernetes.io/worker=worker
kubectl label node bigtreetech-cb1 node-role.kubernetes.io/worker=worker

kubectl get nodes
NAME              STATUS   ROLES                  AGE     VERSION
bigtreetech-cb1   Ready    worker                 9m26s   v1.31.6+k3s1
pi4b              Ready    worker                 9m44s   v1.31.6+k3s1
pi5               Ready    control-plane,master   79m     v1.31.6+k3s1

卸载 k3s：

#master节点：
/usr/local/bin/k3s-uninstall.sh

#worker节点：
/usr/local/bin/k3s-agent-uninstall.sh

4 K3S 基本操作

查看 k3s 服务状态

#master节点：
systemctl status k3s.service

#worker节点：
systemctl status k3s-agent.service

日志查看命令，如果安装有问题，可以查看实时输出日志

#master节点
sudo journalctl -u k3s.service -f

#worker节点
sudo journalctl -u k3s-agent.service -f

K3s 内置了 kubectl，直接使用 kubectl 命令管理集群资源

# 检查集群状态
kubectl get nodes
kubectl get pods -A
kubectl get svc -A

# 查看集群的详细信息
kubectl cluster-info

# 部署应用
kubectl apply -f <yaml-file>

# 删除资源
kubectl delete -f <yaml-file>
kubectl delete pod <pod-name>

# 进入 Pod 调试
kubectl exec -it <pod-name> -- /bin/bash

# 查看资源详情
kubectl describe pod <pod-name>
kubectl describe svc <svc-name>

5 部署MPI支持

为了能够充分利用集群的算力来跑一些计算量较大的计算代码，需要通过分布式计算框架高效地分配任务。

以计算 pi 的 10,000,000 位为例，来部署环境。

计算方案：

选择计算方法：采用高效的Chudnovsky算法来计算π。
分布式计算框架：使用mpi4py库实现并行计算，将任务分配给K3s集群中的各个节点。
容器化部署：利用Kubernetes（K3s）的Pod来运行计算任务，确保资源隔离和管理。
结果收集：将各节点计算的部分结果汇总，生成完整的π值。

5.1 在 K3s 集群中部署 MPI 支持

为了在 K3s 集群中使用 MPI，需要确保所有节点上安装了 MPI 环境，并且节点之间可以通过 MPI 进行通信：

sudo apt-get update
sudo apt-get install -y openmpi-bin openmpi-common libopenmpi-dev

确保所有节点之间可以通过SSH无密码访问，这对于MPI启动是必要的:

生成 SSH 密钥对

在主节点上执行以下命令生成 SSH 密钥对：

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

按回车接受默认路径（~/.ssh/id_rsa 和 ~/.ssh/id_rsa.pub），不要设置密码（直接回车两次），否则后续自动化操作会失败。

将公钥复制到所有节点

使用 ssh-copy-id 命令将主节点的公钥自动复制到其他节点：

ssh-copy-id -i ~/.ssh/id_rsa.pub user@node2_ip
ssh-copy-id -i ~/.ssh/id_rsa.pub user@node3_ip

验证免密登录

测试从主节点到其他节点的免密登录：

ssh user@node2_ip
ssh user@node3_ip

如果成功，会直接进入远程 shell 而不需要输入密码。

配置所有节点间的免密互访

两种方法：
* 在每个节点上生成密钥对，并将公钥复制到其他所有节点。
* 强制所有节点使用主节点的密钥进行认证（适用于简单集群）。

方法一：每个节点生成独立密钥

# 在 node2 上生成密钥对：
ssh-keygen -t rsa -b 4096

# 将 node2 的公钥复制到 node1 和 node3：
ssh-copy-id -i ~/.ssh/id_rsa.pub user@node1_ip
ssh-copy-id -i ~/.ssh/id_rsa.pub user@node3_ip

方法二：统一使用主节点密钥

在 node2 和 node3 上手动添加主节点的公钥：

cat ~/.ssh/id_rsa.pub | ssh user@node2_ip "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
cat ~/.ssh/id_rsa.pub | ssh user@node3_ip "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"

5.2 编写并行计算π的Python脚本

# pi_mpi.py
from mpi4py import MPI
import decimal
import math

def compute_pi_part(start, end, precision):
    """
    计算Chudnovsky算法的一部分项来求和。
    """
    decimal.getcontext().prec = precision + 2  # 额外保留两位以避免舍入误差
    C = 426880 * decimal.Decimal(10005).sqrt()
    K = decimal.Decimal(6)
    M = decimal.Decimal(1)
    X = decimal.Decimal(1)
    L = decimal.Decimal(13591409)
    S = L

    for k in range(1, end + 1):
        if k < start:
            continue
        # 这里需要实现Chudnovsky算法的项计算
        # 由于实现复杂，建议参考优化后的库或简化计算
        # 此处仅为示例，实际需要正确实现
        M = (K**3 - 16*K) * M // (k**3)
        L += 545140134
        X *= -262537412640768000
        S += decimal.Decimal(M * L) / X
        K += 12

    pi = C / S
    return str(pi)

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    # 设置计算精度为10,000,000位
    precision = 10_000_000

    # 为了简化，每个进程计算不同范围的项（需要根据算法调整）
    # 实际应用中需要更合理的任务划分
    terms_per_process = 100000  # 示例值，需根据算法调整
    start = rank * terms_per_process + 1
    end = (rank + 1) * terms_per_process
    pi_part = compute_pi_part(start, end, precision)

    # 收集所有部分结果到根进程
    all_parts = comm.gather(pi_part, root=0)

    if rank == 0:
        # 合并所有部分（需要实际合并逻辑）
        full_pi = "..."  # 实际合并后的π值
        with open("pi_10m_digits.txt", "w") as f:
            f.write(full_pi)
        print("π的计算已完成并保存到pi_10m_digits.txt")
    else:
        pass

if __name__ == "__main__":
    main()

5.3 容器化Python脚本

为了在 K3s 集群中运行该脚本，将其容器化。以下是一个简单的 Dockerfile 示例：

# Dockerfile
FROM python:3.9-slim

RUN apt-get update && apt-get install -y openmpi-bin openmpi-common libopenmpi-dev

COPY pi_mpi.py /app/pi_mpi.py

WORKDIR /app

CMD ["mpirun", "-np", "3", "python3", "pi_mpi.py"]

构建并推送镜像到你的Docker仓库：

docker build -t your_dockerhub_username/pi-mpi:latest .
docker push your_dockerhub_username/pi-mpi:latest

5.4 编写Kubernetes部署文件

创建一个Kubernetes的Deployment配置文件，以在K3s集群中运行容器化的Python脚本。

# pi-mpi-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pi-mpi-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: pi-mpi
  template:
    metadata:
      labels:
        app: pi-mpi
    spec:
      containers:
      - name: pi-mpi-container
        image: your_dockerhub_username/pi-mpi:latest
        # 如果需要共享存储，可以配置PersistentVolume
      restartPolicy: Never

注意：由于MPI需要在多个Pod之间进行通信，上述简单的Deployment可能无法满足需求。更复杂的设置可能需要使用StatefulSets或其他控制器，并配置网络策略以允许Pod间通信。

5.5 运行并行计算任务

由于Kubernetes的Pod之间通信复杂，推荐使用支持MPI的Kubernetes Operator或专门的MPI部署工具，如MPI Operator。以下是使用MPI Operator的简要步骤：

安装MPI Operator：

kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/master/deploy/deployment.yaml

创建MPI Job配置文件：

# pi-mpi-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: pi-mpi-job
spec:
  completions: 1
  parallelism: 3
  template:
    spec:
      containers:
      - name: pi-mpi
        image: your_dockerhub_username/pi-mpi:latest
        command: ["mpirun", "--allow-run-as-root", "-np", "3", "python", "pi_mpi.py"]
        # 如果需要共享存储，可以挂载PersistentVolume
      restartPolicy: Never

提交MPI Job：

kubectl apply -f pi-mpi-job.yaml

监控Job状态：

kubectl get pods
kubectl logs -l job-name=pi-mpi-job

5.6 结果收集与验证

计算完成后，π的结果将保存在容器内的pi_10m_digits.txt文件中, 可以通过以下方式收集结果：

挂载持久存储：在Deployment或Job配置中挂载PersistentVolume，使结果文件可以被多个Pod访问或持久化存储。
从容器中复制文件：

kubectl cp <pod-name>:/app/pi_10m_digits.txt ./pi_10m_digits.txt

6 基于 MPI 的原生分布式调度

直接在集群节点上安装 MPI 环境即可。

6.1 安装 MPI 和 mpi4py

在所有节点上安装 OpenMPI 和 Python 的 MPI 库：

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev


pip install mpi4py
# pip 如果报错的话，就用 conda 安装
conda install mpi4py

6.2 编写 MPI 计算 π 的代码（compute_pi_mpi.py）

from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# 每个节点计算一部分积分
def compute_integral(a, b, n):
    x = np.linspace(a, b, n+1)
    sum_ = 0.0
    for i in range(1, n):
        x_i = (a + b * i) / n
        sum_ += 4 * (1 / (1 + x_i**2))
    return sum_ * (b - a) / n

a, b = 0.0, 1.0
n = 10**6  # 总采样点数
chunk = n // size

start = rank * chunk
end = (rank + 1) * chunk
integral = compute_integral(start, end, n)

# 收集所有节点的结果
total = 0.0
comm.Allreduce(integral, total, op=MPI.SUM)
pi = total

if rank == 0:
    print(f"Computed PI: {pi}")