使用kubeadm 安装 kubernetes 1.12.0

目录

简介:

Kubernetes作为Google开源的容器运行平台,受到了大家的热捧。搭建一套完整的kubernetes平台,也成为试用这套平台必须迈过的坎儿。kubernetes1.5版本以及之前,安装还是相对比较方便的,官方就有通过yum源在centos7安装kubernetes。但是在kubernetes1.6之后,安装就比较繁琐了,需要证书各种认证,对于刚接触kubernetes的人来说很不友好。

架构说明:

两台主机:

18.16.202.35 master
18.16.202.36 slaver             

系统配置:

1.1 关闭防火墙

systemctl stop firewalld
systemctl disable firewalld

1.2 禁用SELinux

setenforce 0

编辑文件/etc/selinux/config,将SELINUX修改为disabled,如下:

sed -i 's/SELINUX=permissive/SELINUX=disabled/' /etc/sysconfig/selinux

#SELINUX=disabled

1.3 关闭系统Swap

Kubernetes 1.8开始要求关闭系统的Swap,如果不关闭,默认配置下kubelet将无法启动。方法一,通过kubelet的启动参数–fail-swap-on=false更改这个限制。方法二,关闭系统的Swap。

swapoff -a

修改/etc/fstab文件,注释掉SWAP的自动挂载,使用free -m确认swap已经关闭。

#注释掉swap分区
[root@localhost /]# sed -i 's/.*swap.*/#&/' /etc/fstab

#/dev/mapper/centos-swap swap                    swap    defaults        0 0
                                                                
[root@localhost /]# free -m
              total        used        free      shared  buff/cache   available
Mem:            962         154         446           6         361         612
Swap:             0           0           0

1.4 安装docker

sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum makecache fast

sudo yum -y install docker-ce
systemctl enable docker.service
systemctl restart docker

我这里安装的是docker-ce 18.06

使用kubeadm部署Kubernetes:

2.1 安装kubeadm和kubelet

下面在各节点安装kubeadm和kubelet:

# 配置源
$ cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

# 安装
$ yum makecache fast
$ yum install -y kubelet kubeadm kubectl ipvsadm

配置:

# 配置转发相关参数,否则可能会出错
$ cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vm.swappiness=0
EOF

# 使配置生效
$ sysctl --system

# 如果net.bridge.bridge-nf-call-iptables报错,加载br_netfilter模块
$ modprobe br_netfilter
$ sysctl -p /etc/sysctl.d/k8s.conf

# 加载ipvs相关内核模块
# 如果重新开机,需要重新加载(可以写在 /etc/rc.local 中开机自动加载)
$ modprobe ip_vs
$ modprobe ip_vs_rr
$ modprobe ip_vs_wrr
$ modprobe ip_vs_sh
$ modprobe nf_conntrack_ipv4
# 查看是否加载成功
$ lsmod | grep ip_vs

配置启动kubelet(所有节点)

# 配置kubelet使用国内pause镜像
# 配置kubelet的cgroups
# 获取docker的cgroups
DOCKER_CGROUPS=$(docker info | grep 'Cgroup' | cut -d' ' -f3)
echo $DOCKER_CGROUPS
cat >/etc/sysconfig/kubelet<<EOF
KUBELET_EXTRA_ARGS="--cgroup-driver=$DOCKER_CGROUPS --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause-amd64:3.1"
EOF

# 启动
$ systemctl daemon-reload
$ systemctl enable kubelet && systemctl restart kubelet

在这里使用systemctl status kubelet,你会发现报错误信息;

10月 11 00:26:43 node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
10月 11 00:26:43 node1 systemd[1]: Unit kubelet.service entered failed state.
10月 11 00:26:43 node1 systemd[1]: kubelet.service failed.

运行journalctl -xefu kubelet 命令查看systemd日志才发现,真正的错误是:

unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory

这个错误在运行kubeadm init 生成CA证书后会被自动解决,此处可先忽略。

简单地说就是在kubeadm init 之前kubelet会不断重启。


2.2 配置master节点

直接使用命令:

kubeadm init \
   --kubernetes-version=v1.12.0 \
   --pod-network-cidr=10.244.0.0/16 \
   --apiserver-advertise-address=192.168.61.11 \
   --ignore-preflight-errors=Swap

或者使用kubeadm-master.config配置文件,在/etc/kubernetes/文件夹下面操作:

# 1.11 版本 centos 下使用 ipvs 模式会出问题
# 参考 https://github.com/kubernetes/kubernetes/issues/65461

# 生成配置文件
cat >/etc/kubernetes/kubeadm-master.config<<EOF
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.12.0
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
api:
  advertiseAddress: 18.16.202.35 

controllerManagerExtraArgs:
  node-monitor-grace-period: 10s
  pod-eviction-timeout: 10s

networking:
  podSubnet: 10.244.0.0/16
  
kubeProxy:
  config:
    mode: ipvs
    # mode: iptables
EOF


# 提前拉取镜像
# 如果执行失败 可以多次执行
kubeadm config images pull --config /etc/kubernetes/kubeadm-master.config

# 初始化
kubeadm init --config /etc/kubernetes/kubeadm-master.config
# 或者
# kubeadm init --config /etc/kubernetes/kubeadm-master.config --ignore-preflight-errors=all

安装过程中遇到异常:

[preflight] Some fatal errors occurred:
  [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

直接删除/var/lib/etcd文件夹


如果初始化过程出现问题,使用如下命令重置:

kubeadm reset

rm -rf /var/lib/cni/ $HOME/.kube/config

2.3 初始化master节点:

[root@localhost kubernetes]#  kubeadm init --config kubeadm-master.config
[init] using Kubernetes version: v1.12.0
[preflight] running pre-flight checks
    [WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs] or no builtin kernel ipvs support: map[ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[preflight/images] Pulling images required for setting up a Kubernetes cluster
[preflight/images] This might take a minute or two, depending on the speed of your internet connection
[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[preflight] Activating the kubelet service
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [127.0.0.1 ::1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [localhost.localdomain localhost] and IPs [18.16.202.35 127.0.0.1 ::1]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [localhost.localdomain kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 18.16.202.35]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] valid certificates and keys now exist in "/etc/kubernetes/pki"
[certificates] Generated sa key and public key.
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests" 
[init] this might take a minute or longer if the control plane images have to be pulled
[apiclient] All control plane components are healthy after 40.510372 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.12" in namespace kube-system with the configuration for the kubelets in the cluster
[markmaster] Marking the node localhost.localdomain as master by adding the label "node-role.kubernetes.io/master=''"
[markmaster] Marking the node localhost.localdomain as master by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "localhost.localdomain" as an annotation
[bootstraptoken] using token: xc9gpo.mmv1mmsjhq6tzhdc
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

    kubeadm join 18.16.202.35:6443 --token ccxrk8.myui0xu4syp99gxu --discovery-token-ca-cert-hash sha256:e3c90ace969aa4d62143e7da6202f548662866dfe33c140095b020031bff2986

上面记录了完成的初始化输出的内容,根据输出的内容基本上可以看出手动初始化安装一个Kubernetes集群所需要的关键步骤。

其中有以下关键内容:

  • [kubelet] 生成kubelet的配置文件”/var/lib/kubelet/config.yaml”

  • [certificates]生成相关的各种证书

  • [kubeconfig]生成相关的kubeconfig文件

  • [bootstraptoken]生成token记录下来,后边使用kubeadm join往集群中添加节点时会用到

  • 下面的命令是配置常规用户如何使用kubectl访问集群:

      mkdir -p $HOME/.kube
      sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
      sudo chown $(id -u):$(id -g) $HOME/.kube/config
  • 最后给出了将节点加入集群的命令:

      kubeadm join 18.16.202.35:6443 --token ccxrk8.myui0xu4syp99gxu --discovery-token-ca-cert-hash sha256:e3c90ace969aa4d62143e7da6202f548662866dfe33c140095b020031bff2986

2.4 配置使用kubectl

如下操作在master节点操作

$ rm -rf $HOME/.kube
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 查看node节点
$ kubectl get nodes
NAME    STATUS     ROLES    AGE     VERSION
node1   NotReady   master   6m19s   v1.12.0

2.5 配置使用网络插件

如下操作在master节点操作

# 下载配置
$ cd ~ && mkdir flannel && cd flannel
$ wget https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

修改配置文件kube-flannel.yml:

# 修改kube-flannel.yml中配置
# 此处的ip配置要与上面kubeadm的pod-network一致
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

# 默认的镜像是quay.io/coreos/flannel:v0.10.0-amd64,如果你能pull下来就不用修改镜像地址,否则,修改yml中镜像地址为阿里镜像源
image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64

# 如果Node有多个网卡的话,参考flannel issues 39701,
# https://github.com/kubernetes/kubernetes/issues/39701
# 目前需要在kube-flannel.yml中使用--iface参数指定集群主机内网网卡的名称,
# 否则可能会出现dns无法解析。容器无法通信的情况,需要将kube-flannel.yml下载到本地,
# flanneld启动参数加上--iface=<iface-name>
    containers:
      - name: kube-flannel
        image: registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface=ens33
        - --iface=eth0
⚠️⚠️⚠️--iface=ens33 的值,是你当前的网卡,或者可以指定多网卡

# 1.12版本的kubeadm额外给node1节点设置了一个污点(Taint):node.kubernetes.io/not-ready:NoSchedule,
# 很容易理解,即如果节点还没有ready之前,是不接受调度的。可是如果Kubernetes的网络插件还没有部署的话,节点是不会进入ready状态的。
# 因此我们修改以下kube-flannel.yaml的内容,加入对node.kubernetes.io/not-ready:NoSchedule这个污点的容忍:
    tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoSchedule

启动:

# 启动
$ kubectl apply -f ~/flannel/kube-flannel.yml

# 查看
$ kubectl get pods --namespace kube-system
# kubectl get service
$ kubectl get svc --namespace kube-system

# 只有网络插件也安装配置完成之后,才能会显示为ready状态
# 设置master允许部署应用pod,参与工作负载,现在可以部署其他系统组件
# 如 dashboard, heapster, efk等
$ kubectl taint nodes --all node-role.kubernetes.io/master-
# 或者 kubectl taint nodes node1 node-role.kubernetes.io/master-   
  node/node1 untainted

# master不运行pod
# kubectl taint nodes node1 node-role.kubernetes.io/master=:NoSchedule

2.6 配置node节点加入集群

如下操作在所有node节点操作

# 此命令为初始化master成功后返回的结果
$   kubeadm join 18.16.202.35:6443 --token ccxrk8.myui0xu4syp99gxu --discovery-token-ca-cert-hash sha256:e3c90ace969aa4d62143e7da6202f548662866dfe33c140095b020031bff2986

异常信息:
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "18.16.202.35:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://18.16.202.35:6443"
[discovery] Requesting info from "https://18.16.202.35:6443" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://18.16.202.35:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate has expired or is not yet valid]
[discovery] Failed to request cluster info, will try again: [Get https://18.16.202.35:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate has expired or is not yet valid]

这个问题我这边是因为服务器时间不对,调整以后就ok了。


查看pods:

[root@node1 flannel]# kubectl get pods -n kube-system
NAME                            READY   STATUS             RESTARTS   AGE
coredns-6c66ffc55b-l76bq        1/1     Running            0          16m
coredns-6c66ffc55b-zlsvh        1/1     Running            0          16m
etcd-node1                      1/1     Running            0          16m
kube-apiserver-node1            1/1     Running            0          16m
kube-controller-manager-node1   1/1     Running            0          15m
kube-flannel-ds-sr6tq           0/1     CrashLoopBackOff   6          7m12s
kube-flannel-ds-ttzhv           1/1     Running            0          9m24s
kube-proxy-nfbg2                1/1     Running            0          7m12s
kube-proxy-r4g7b                1/1     Running            0          16m
kube-scheduler-node1            1/1     Running            0          16m

查看异常pod信息:

[root@node1 flannel]# kubectl  describe pods kube-flannel-ds-sr6tq -n  kube-system
Name:               kube-flannel-ds-sr6tq
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
。。。。。
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Pulling    12m                  kubelet, node2     pulling image "registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64"
  Normal   Pulled     11m                  kubelet, node2     Successfully pulled image "registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64"
  Normal   Created    11m                  kubelet, node2     Created container
  Normal   Started    11m                  kubelet, node2     Started container
  Normal   Created    11m (x4 over 11m)    kubelet, node2     Created container
  Normal   Started    11m (x4 over 11m)    kubelet, node2     Started container
  Normal   Pulled     10m (x5 over 11m)    kubelet, node2     Container image "registry.cn-shanghai.aliyuncs.com/gcr-k8s/flannel:v0.10.0-amd64" already present on machine
  Normal   Scheduled  7m15s                default-scheduler  Successfully assigned kube-system/kube-flannel-ds-sr6tq to node2
  Warning  BackOff    7m6s (x23 over 11m)  kubelet, node2     Back-off restarting failed container

遇到这种情况直接 删除异常pod:

[root@node1 flannel]# kubectl delete pod kube-flannel-ds-sr6tq -n kube-system
pod "kube-flannel-ds-sr6tq" deleted
[root@node1 flannel]# kubectl get pods -n kube-system
NAME                            READY   STATUS    RESTARTS   AGE
coredns-6c66ffc55b-l76bq        1/1     Running   0          17m
coredns-6c66ffc55b-zlsvh        1/1     Running   0          17m
etcd-node1                      1/1     Running   0          16m
kube-apiserver-node1            1/1     Running   0          16m
kube-controller-manager-node1   1/1     Running   0          16m
kube-flannel-ds-7lfrh           1/1     Running   1          6s
kube-flannel-ds-ttzhv           1/1     Running   0          10m
kube-proxy-nfbg2                1/1     Running   0          7m55s
kube-proxy-r4g7b                1/1     Running   0          17m
kube-scheduler-node1            1/1     Running   0          16m

查看节点:

[root@node1 flannel]# kubectl get nodes -n kube-system
NAME    STATUS   ROLES    AGE     VERSION
node1   Ready    master   17m     v1.12.1
node2   Ready    <none>   8m14s   v1.12.1

参考:

https://www.cnblogs.com/liangDream/p/7358847.html

https://my.oschina.net/binges/blog/1615955?p=2&temp=1521445654544

https://blog.frognew.com/2018/10/kubeadm-install-kubernetes-1.12.html

https://www.jianshu.com/p/31bee0cecaf2

https://www.zybuluo.com/ncepuwanghui/note/953929

https://www.kubernetes.org.cn/4256.html

https://note.youdao.com/share/?id=31d9d5db79cc3ae27e72c029b09ac4ab&type=note#/

https://juejin.im/post/5b45d4185188251ac062f27c

https://www.jianshu.com/p/02dc13d2f651

https://blog.youkuaiyun.com/qq_34857250/article/details/82562514

https://www.cnblogs.com/ssss429170331/p/7685044.html

https://imroc.io/posts/kubernetes/install-kubernetes-1.9-on-centos7-with-kubeadm/

<think>首先,用户的问题是关于如何安装特定版本的TensorFlow,即TensorFlow1.12.0。我需要参考用户提供的引用[1],[2],[3],[4],[5]。这些引用是从站内或相关源来的片段。回顾引用:-引用[1]:强调了cuda,cudann和TensorFlow版本必须对应好,避免冲突。还提到安装中途中止可能导致问题,最终通过卸载Python重新安装解决。-引用[2]:给出安装tensorflow-gpu1.12.0的具体命令:`python-mpip--default-timeout=6000install--upgrade-ihttps://pypi.tuna.tsinghua.edu.cn/simpletensorflow-gpu==1.12.0--no-cache-dir`-引用[3]:提到如果安装失败,可以用`pipuninstalltensorflow`卸载,然后重新安装。适用于Windows。-引用[4]:提到下载对应的GPU版本的TF。-引用[5]:描述了从源代码编译安装TensorFlow1.12.0的过程,包括使用bazel编译。用户的需求是安装TensorFlow1.12.0,这可能是CPU或GPU版本。从引用看,似乎偏向GPU版本,因为引用[1],[2],[4]都提到了GPU和cuda。所以,我应该强调GPU安装的注意事项。系统指令:-使用数学表达式时,内联用`$...$`,独立用`$$...$$`,但我这里没涉及数学公式。-LaTeX语法正确:检查语法。-使用中文回答。-生成相关问题。-在回答中引用段落末尾添加引用标识,例如[^1]。-结构清晰,逐步解决问题。-最后添加相关问题。我的回答结构:1.**前置检查**:确认Python环境和依赖。2.**安装步骤**:使用pip安装指定版本。-包括卸载旧版本的部分。3.**注意事项**:版本兼容性,cuda版本等。4.**验证安装**:安装后如何测试。5.**问题排查**:常见问题如版本冲突、卸载不彻底等。6.**相关问题生成**:在最后生成相关问题。使用引用:-在相关部分添加引用标识。例如,当提到cuda版本兼容时,参考引用[1];当给出安装命令时,参考引用[2]。生成相关问题:-基于用户需求,相关主题包括:版本管理、卸载方法、虚拟环境等。步骤分解:1.**前置检查**:-确保有Python和pip。-检查cuda和cudnn版本(如安装GPU版本)。TensorFlow1.12.0需要cuda9.0和cudnn7。[从引用[1],[2],[5]推知]-建议使用虚拟环境以避免系统冲突。2.**卸载旧版本**:-如果之前安装了TensorFlow,先卸载。使用`pipuninstalltensorflowtensorflow-gpu`。参考引用[3]。3.**安装步骤**:-标准安装使用pipinstall。命令如`pipinstalltensorflow==1.12.0`forCPU,或`pipinstalltensorflow-gpu==1.12.0`forGPU。引用[2]给出具体命令:`python-mpip--default-timeout=6000install--upgrade-ihttps://pypi.tuna.tsinghua.edu.cn/simpletensorflow-gpu==1.12.0--no-cache-dir`。我可以建议使用清华源来加速。-解释选项:`--default-timeout=6000`增加超时时间,`-i`指定源,`--no-cache-dir`避免缓存。4.**编译安装(可选)**:-引用[5]描述了从源代码编译安装的方法。这更复杂,适用于需要定制或支持的平台。简要提到,但不推荐除非必要。5.**验证安装**:-Python中`importtensorflowastf;print(tf.__version__)`-检查是否报错。6.**注意事项**:-cuda版本对应:TensorFlow1.12.0需要cuda9.0,cudnn7。参考引用[1]。-如果安装失败,尝试卸载干净或使用虚拟环境。引用[1]提到重新安装Python的例子,但建议先试卸载。-Windows特定建议:引用[3],[4]提到Windows安装。7.**问题排查**:-卸载不彻底时如何办:手动删除残留文件。-版本冲突时:使用`pipcheck`或重新创建环境。生成相关问题:-1.如何安装特定版本的CUDA来配合TensorFlow?-2.如何彻底卸载TensorFlow及其依赖?-3.使用虚拟环境管理不同TensorFlow版本的优点是什么?确保LaTeX无碍:本回答可能没有数学公式,但如果有,用正确格式。用中文回答:整个回答用简体中文。最后,结构化输出:使用清晰的标题和代码块。</think> ### 安装 TensorFlow 1.12.0 的完整指南 TensorFlow 1.12.0 是一个较旧版本,主要用于兼容特定项目(如依赖旧版CUDA的项目)。安装时需要特别注意版本兼容性和依赖管理,以下步骤基于您提供的引用[^1][^2][^3][^4][^5],并涵盖CPU和GPU版本的安装。**关键提示**: - **版本兼容性**:TensorFlow 1.12.0 需要 CUDA 9.0 和 cuDNN 7(GPU版本)[^1][^4]。三者版本必须匹配,否则导入时会出错。 - **环境隔离推荐**:使用虚拟环境(如 `venv` 或 `conda`)避免系统级依赖冲突[^1]。 - **优先使用 pip 安装**:从源代码编译安装(引用[^5])更复杂,仅适用于特定需求(如定制优化),建议优先采用 pip 方法。 #### 步骤 1: 准备工作(前置检查) - **检查 Python 环境**: - TensorFlow 1.12.0 支持 Python 3.5–3.6(不支持 3.7 及以上)。确保您使用兼容版本: ```bash python --version # 查看 Python 版本 ``` - 如果版本不匹配,安装 Python 3.6(如用 `conda create -n tf_env python=3.6` 创建虚拟环境)。 - **检查 CUDA 和 cuDNN(GPU 版本必备)**: - GPU 安装需要 CUDA 9.0 和 cuDNN 7。验证方法: ```bash nvcc --version # 查看 CUDA 版本 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 # 查看 cuDNN 版本 ``` - 如果未安装或不匹配,参考 [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) 安装 CUDA 9.0,并从 [NVIDIA cuDNN](https://developer.nvidia.com/rdp/cudnn-archive) 下载 cuDNN 7(Windows 用户需设置环境变量)[^1][^4]。 - **卸载旧版本 TensorFlow(避免冲突)**: - 如果系统有残留的 TensorFlow 安装,先卸载: ```bash pip uninstall tensorflow tensorflow-gpu # 卸载 CPU 和 GPU 版本 ``` - **彻底清理**:如果 pip 卸载不彻底(如残留文件导致导入错误),手动删除相关目录: ```bash rm -rf ~/.local/lib/python*/site-packages/tensorflow* # Linux/macOS del /S /Q "%APPDATA%\Python\Python*\site-packages\tensorflow*" # Windows CMD ``` 参考引用[^3]:卸载后重新安装是解决冲突的有效方法。 #### 步骤 2: 安装 TensorFlow 1.12.0 推荐使用 `pip` 安装,指定版本号和清华源(加速下载)。**注意**:确保使用 pip 对应您的 Python 版本(如 `pip3` 用于 Python 3)。 - **标准 pip 安装(推荐方法)**: - **GPU 版本安装**:使用引用[^2]中的命令,添加超时选项避免下载失败: ```bash python -m pip install --default-timeout=6000 tensorflow-gpu==1.12.0 -i https://pypi.tuna.tsinghua.edu.cn/simple --no-cache-dir ``` - **CPU 版本安装**(如果不需要 GPU): ```bash python -m pip install --default-timeout=6000 tensorflow==1.12.0 -i https://pypi.tuna.tsinghua.edu.cn/simple --no-cache-dir ``` **命令解析**: - `--default-timeout=6000`:设置超时为 6000 秒,防止网络中断(引用[^2])。 - `-i https://pypi.tuna.tsinghua.edu.cn/simple`:使用清华镜像源加速下载。 - `--no-cache-dir`:避免缓存导致文件冲突[^2]。 - **从源代码编译安装(可选)**:引用[^5]提供了编译流程,但**仅适用于高级用户或有特定优化需求(如 SSE 指令集)**。步骤简述: ```bash # 1. 下载源代码 (https://github.com/tensorflow/tensorflow/tags) wget https://github.com/tensorflow/tensorflow/archive/v1.12.0.tar.gz tar -xvf v1.12.0.tar.gz cd tensorflow-1.12.0 # 2. 运行配置脚本 (指定 Python 路径) ./configure # 选择 Python 3.6 路径(如 /usr/bin/python3),其他选项通常选 "no" # 3. 编译并构建 pip 包 bazel build -c opt //tensorflow/tools/pip_package:build_pip_package ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg # 4. 安装生成的包 pip install /tmp/tensorflow_pkg/tensorflow-1.12.0-*.whl ``` **注意**:编译需要 Bazel 和 C++工具链,且容易出错。优先建议 pip 安装[^5]。 #### 步骤 3: 验证安装 安装后测试 TensorFlow 是否工作正常: ```python import tensorflow as tf print("TensorFlow 版本:", tf.__version__) ``` - **预期输出**: `TensorFlow 版本: 1.12.0` - **错误排查**: - 如果导入失败(如 `ModuleNotFoundError`),说明安装未完成;检查 pip 日志或重新运行安装命令。 - 如果提示 CUDA 错误,验证 CUDA 9.0 和 cuDNN 7 是否配置正确(引用[^1][^4])。 - 在虚拟环境中测试,确保环境激活: ```bash source tf_env/bin/activate # Linux/macOS (venv) conda activate tf_env # Conda ``` #### 注意事项与常见问题 1. **依赖冲突处理**: - 如果安装后出现依赖错误(如 protobuf 版本不兼容),使用 `pip check` 扫描冲突: ```bash pip check ``` 如有错误,指定依赖版本安装(如 `pip install protobuf==3.6.0`)[^1]。 - 遇到顽固冲突时,创建干净的虚拟环境重新安装(引用[^1]提到重新安装 Python 是最彻底方案)。 2. **平台特定建议**: - **Windows 用户**:安装 CUDA 后需重启系统确保环境变量生效(引用[^4])。如果安装失败,尝试以管理员权限运行 CMD。 - **Linux/macOS 用户**:优先使用系统包管理器安装 CUDA(如 `apt install cuda-9-0`)。 3. **引用关键点**: - 版本兼容性是成功的关键:TensorFlow 1.12.0 + CUDA 9.0 + cuDNN 7 必须匹配[^1][^4]。 - 安装中断或版本冲突时,先卸载再重装(引用[^3])。 - 优先使用 pip 而非源代码编译,除非有优化需求(引用[^5])。 #### 安装成功后的下一步 - 测试简单示例: ```python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello)) ``` - 如果输出 `Hello, TensorFlow!`,则表示 TensorFlow 1.12.0 已正常工作。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值