欢迎观看,有什么问题可以在源仓库提下issue,可以一起学习讨论
https://github.com/JK-97/my_note
一、前期准备
k8s有很多种搭建方式,google上查找的大部分教程都是基于AWS和GCP的,而网上搭建本地的集群的教程极为零散。
那么接下就开始搭建之路吧!
示例环境
master 192.168.0.105
node 192.168.0.115
已经适配的版本:
- kubernets
kubeadm kubelet kubectl 全部要统一版本v1.12.8
- docker :
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
- nvidia-docker
NVIDIA Docker: 2.0.3
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
相关命令
$ kubeadm version
$ docker version
$ nvidia-docker version
第一步:安装docker
# 安装最新版本
$ sudo apt-get update
$ sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
# 获取docker的repo
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
# 直接安装是安装最新版本的,这里需要安装指定版本,我们跳过
# 该教程使用的就是 17.12.1~ce-0~ubuntu 版本
# 安装指定版本,紧接上一段倒数第二句命令
$ apt-cache madison docker-ce
docker-ce | 5:18.09.1~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.0~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.0~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
·····
# 查看有什么版本
$ sudo apt-get install docker-ce=<VERSION_STRING> containerd.io
# eg. sudo apt-get install docker-ce=17.12.1~ce-0~ubuntu containerd.io
#这样就完成了docker的安装
第二步:安装nvidia-docker
$ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
$ sudo apt-get purge -y nvidia-docker
# 卸载旧版的nvidia-docker,之前没安装就跳过
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
# 直接装是最新版本,会自动升级docker到最新版本,一般情况下我们不这么做,我们这里不适用
# 安装指定版本
$ apt-cache madison nvidia-docker2
nvidia-docker2 | 2.0.3+docker18.09.5-3 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.5-2 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.4-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.3-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.09.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-2 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 Packages
nvidia-docker2 | 2.0.3+docker18.06.2-1 | https://nvidia.github.io/
····
# 获取到版本号后,直接装也是不行的
# 他会提示你要有新的依赖,需要安装最新的nvidia-container-runtime,实际是不需要的
# 所以安装还要带上nvidia-container-runtime并且指定一个版本
$ apt-cache madison nvidia-container-runtime
nvidia-container-runtime | 2.0.0+docker18.09.5-3 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.5-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.4-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.3-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.2-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
nvidia-container-runtime | 2.0.0+docker18.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 Packages
····
# 查看版本对应的docker版本
$ sudo apt-get install -y nvidia-docker2=2.0.3+docker17.12.1-1 nvidia-container-runtime=2.0.0+docker17.12.1-1
# 最终选择这样匹配的版本
# 卸载docker
$ apt autoremove docker-ce containerd.io
第三步:配置显卡
# 需要修改docker 的daemon
$ vim /etc/docker/daemon.json
# 写入以下内容
{
"registry-mirrors": ["https://registry.docker-cn.com"],
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
$ sudo pkill -SIGHUP dockerd
$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile