windows 11 搭建 TensorFlow GPU 开发环境【RTX 3060】:2 -- 基于WSL2 docker 方式的使用



简介

目前我看官网主要推荐docker 方式了,那我们就用docker 方式试试。而且网上的安装教程也是docker 的居多【官方给出了一个教程】,我们也要与时俱进。

下面是我机器wsl kernel的版本:可见是没有最新,只有更新哈!

season@season:~$ uname -r
5.10.16.3-microsoft-standard-WSL2

在这里插入图片描述

官方文档:

  • https://docs.nvidia.com/cuda/wsl-user-guide/index.html

使用 wsl 的docker 进行深度学习与 原生方式的对比

PyTorch MNIST 测试,这是一个有目的的小型玩具机器学习示例,它强调了保持 GPU 忙碌以达到满意的 WSL2性能的重要性。与原生 Linux 一样,工作负载越小,就越有可能由于启动 GPU 进程的开销而导致性能下降。这种退化在 WSL2上更为明显,并且与原生 Linux 的规模不同。
在这里插入图片描述

从图中可以看出如果batch size小的话,很多时间会消耗在CUDA调用上,batch size=8的时候,时间消耗会是native CUDA的138%。如果提高batch size,让CUDA充分忙碌,性能可以接近native!
以下是原文链接。

https://developer.nvidia.com/blog/leveling-up-cuda-performance-on-wsl2-with-new-enhancements/

主要步骤

1.安装 wsl-2 版本的windows NVIDIA驱动

cuda 驱动 on wsl
https://developer.nvidia.com/cuda/wsl/download
在这里插入图片描述
不知道为啥,这个没有特别说明和wsl 有啥关系。。。我已经有驱动了,这个不知道装了个啥。做如图所示的选择。

特别注意,在wsl-2 中安装 cuda toolkit 要使用如下脚本:
红框处是单独的选项

在这里插入图片描述

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.5.1/local_installers/cuda-repo-wsl-ubuntu-11-5-local_11.5.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-11-5-local_11.5.1-1_amd64.deb
sudo apt-key add /var/cuda-repo-wsl-ubuntu-11-5-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

安装好以后进行测试:Black-Scholes模型,简称B-S模型,是一种对金融产品估价的数学模型。

cd /usr/local/cuda-11.5/samples/4_Finance/BlackScholes

season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ ll
total 60
drwxr-xr-x  4 root root  4096 Nov 24 23:07 ./
drwxr-xr-x 10 root root  4096 Nov 24 23:07 ../
drwxr-xr-x  2 root root  4096 Nov 24 23:07 .vscode/
-rw-r--r--  1 root root  8382 Sep 21 01:38 BlackScholes.cu
-rw-r--r--  1 root root  2787 Sep 21 01:38 BlackScholes_gold.cpp
-rw-r--r--  1 root root  3646 Sep 21 01:38 BlackScholes_kernel.cuh
-rw-r--r--  1 root root 13454 Sep 21 01:38 Makefile
-rw-r--r--  1 root root  1859 Sep 21 01:38 NsightEclipse.xml
drwxr-xr-x  2 root root  4096 Nov 24 23:07 doc/
-rw-r--r--  1 root root   189 Sep 21 01:38 readme.txt
season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ sudo make BlackScholes
>>> GCC Version is greater or equal to 5.1.0 <<<
/usr/local/cuda-11.5/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -maxrregcount=16 --threads 0 --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes.o -c BlackScholes.cu
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ptxas warning : For profile sm_86 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_75 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_70 adjusting per thread register count of 16 to lower bound of 24
ptxas warning : For profile sm_80 adjusting per thread register count of 16 to lower bound of 24
/usr/local/cuda-11.5/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -maxrregcount=16 --threads 0 --std=c++11 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes_gold.o -c BlackScholes_gold.cpp
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda-11.5/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o BlackScholes BlackScholes.o BlackScholes_gold.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp BlackScholes ../../bin/x86_64/linux/release
season@season:/usr/local/cuda-11.5/samples/4_Finance/BlackScholes$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.261508 msec
Effective memory bandwidth: 305.918207 GB/s
Gigaoptions per second    : 30.591821

BlackScholes, Throughput = 30.5918 GOptions/s, Time = 0.00026 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

2. 在wsl-2 中安装 docker 及 NVIDIA 容器

安装标准docker

curl https://get.docker.com | sh

之后输出:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18617  100 18617    0     0  25294      0 --:--:-- --:--:-- --:--:-- 25294
# Executing docker install script, commit: 93d2499759296ac1f9c510605fef85052a2c32be

WSL DETECTED: We recommend using Docker Desktop for Windows.
Please get Docker Desktop from https://www.docker.com/products/docker-desktop


You may press Ctrl+C now to abort this script.
+ sleep 20
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null
+ sudo -E sh -c curl -fsSL "https://download.docker.com/linux/ubuntu/gpg" | gpg --dearmor --yes -o /usr/share/keyrings/docker-archive-keyring.gpg
+ sudo -E sh -c echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu focal stable" > /etc/apt/sources.list.d/docker.list
+ sudo -E sh -c apt-get update -qq >/dev/null
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends  docker-ce-cli docker-scan-plugin docker-ce >/dev/null
+ version_gte 20.10
+ [ -z  ]
+ return 0
+ sudo -E sh -c DEBIAN_FRONTEND=noninteractive apt-get install -y -qq docker-ce-rootless-extras >/dev/null

================================================================================

To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:

    dockerd-rootless-setuptool.sh install

Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.


To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/

WARNING: Access to the remote API on a privileged Docker daemon is equivalent
         to root access on the host. Refer to the 'Docker daemon attack surface'
         documentation for details: https://docs.docker.com/go/attack-surface/

================================================================================

装完之后发现,这个突然出现的玩意也是挺占用内存的
在这里插入图片描述

安装 NVIDIA Container 以及 Toolkit

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

$ sudo apt-get update

$ sudo apt-get install -y nvidia-docker2

下面我们继续安装官网给出的测试docker 进行一下测试。
https://docs.nvidia.com/cuda/wsl-user-guide/index.html


测试1,simple container

season@season:~$ sudo service docker stop
[sudo] password for season:
 * Docker already stopped - file /var/run/docker-ssd.pid not found.
season@season:~$ sudo service docker start
 * Starting Docker: docker                                                                                       [ OK ]

season@season:~$ sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Unable to find image 'nvcr.io/nvidia/k8s/cuda-sample:nbody' locally
nbody: Pulling from nvidia/k8s/cuda-sample
d519e2592276: Pull complete
d22d2dfcfa9c: Pull complete
b3afe92c540b: Pull complete
b25f8d7adb24: Pull complete
ddb025f124b9: Pull complete
fe72fda9c19e: Pull complete
c6a265e4ffa3: Pull complete
c931a9542ebf: Pull complete
f7eb321dd245: Pull complete
d67fd954fbd5: Pull complete
Digest: sha256:a2117f5b8eb3012076448968fd1790c6b63975c6b094a8bd51411dee0c08440d
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060 Laptop GPU]
30720 bodies, total time for 10 iterations: 25.952 ms
= 363.636 billion interactions per second
= 7272.727 single-precision GFLOP/s at 20 flops per interaction
season@season:~$

测试2:Jupyter Notebooks

启动脚本:

docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

在这里插入图片描述

测试效果,感觉跑结果交互起来非常缓慢,下面这段程序大概跑了5-10分钟

import tensorflow as tf
version = tf.__version__
gpu_ok = tf.test.is_gpu_available()
WARNING:tensorflow:From <ipython-input-3-11c8f8f7a7c6>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
 print("tf version:",version,"\nuse GPU",gpu_ok)
tf version: 2.1.0 
use GPU True
tf.reduce_sum(tf.random.normal([1000, 1000]))
<tf.Tensor: shape=(), dtype=float32, numpy=1321.2869>

问题:为啥 jupyter notebook 的这个docker 调用巨慢无比???

姑且把日志贴出来,大佬分析分析啥原因,会不是时间时区不一样?造成反映慢。。。

[I 16:29:45.604 NotebookApp] 302 GET /?token=99b4290f4d03e9fe537a7fe14872d20f2a50f8a23c81c247 (172.17.0.1) 0.55ms
[I 16:29:49.983 NotebookApp] Creating new notebook in
[I 16:29:50.024 NotebookApp] Writing notebook-signing key to /root/.local/share/jupyter/notebook_secret
[I 16:29:50.496 NotebookApp] Kernel started: 24d0d681-bfac-4723-9e73-541c5c4f2443
2021-11-25 16:30:05.794652: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2021-11-25 16:30:05.824203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2021-11-25 16:30:29.686851: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-11-25 16:30:29.700683: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2303995000 Hz
2021-11-25 16:30:29.702117: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x484f3f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-25 16:30:29.702147: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-11-25 16:30:29.704570: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-11-25 16:30:30.076279: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.076510: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x48c1ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-11-25 16:30:30.076562: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6
2021-11-25 16:30:30.077105: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.077137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:30:30.077168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:30:30.077183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:30:30.111039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:30:30.116390: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:30:30.162423: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-25 16:30:30.167253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:30:30.167342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:30:30.167733: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.167962: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:30:30.168006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-11-25 16:30:30.168473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[I 16:31:50.494 NotebookApp] Saving file at /Untitled.ipynb
2021-11-25 16:33:20.133916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-25 16:33:20.133958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-11-25 16:33:20.133980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-11-25 16:33:20.135139: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.135166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-11-25 16:33:20.135374: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.135442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-11-25 16:33:20.152265: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:33:20.152357: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:33:20.152377: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:33:20.152406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:33:20.152427: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:33:20.152446: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-25 16:33:20.152484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:33:20.152507: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:33:20.152726: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152927: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:33:20.152951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
[I 16:33:51.283 NotebookApp] Saving file at /Untitled.ipynb
2021-11-25 16:35:09.708118: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3060 Laptop GPU computeCapability: 8.6
coreClock: 1.702GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s
2021-11-25 16:35:09.708200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-25 16:35:09.708219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-25 16:35:09.708245: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-25 16:35:09.708253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-25 16:35:09.708289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolveeerer.eeeeer.so.10
2021-11-25 16:35:09.708310: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-25 16:35:09.708317: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-25 16:35:09.708564: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708926: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.708955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-11-25 16:35:09.709019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:

2021-11-25 16:35:09.709023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-11-25 16:35:09.709027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-11-25 16:35:09.709380: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.709410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-11-25 16:35:09.709641: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-25 16:35:09.709682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)

参考文献

安装WSL2,官方文档说的比较清楚了

  • https://docs.microsoft.com/zh-cn/windows/wsl/install

5步搭建wsl2+cuda+docker解决windows深度学习开发问题

  • https://zhuanlan.zhihu.com/p/408403790

Windows+WSL2+CUDA+Docker

  • https://blog.youkuaiyun.com/fleaxin/article/details/108911522

tensor flow 官方gpu 支持文档

  • https://tensorflow.google.cn/install/gpu

cuda 官方指导

  • https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/

本文主要参考:CUDA on WSL User Guide

  • https://docs.nvidia.com/cuda/wsl-user-guide/index.html

11.5 版本的文档有docker 说明

  • https://docs.nvidia.com/cuda/archive/11.5.0/wsl-user-guide/index.html#ch05-running-containers
<think>嗯,用户的问题是关于在虚拟机无法安装NVIDIA驱动的情况下,是否可以通过WSL来安装。我需要先理解这两个环境的不同。 首先,虚拟机安装NVIDIA驱动的问题。虚拟机通常使用虚拟化显卡,比如VMware的SVGA或者VirtualBox的显卡,这些可能不被NVIDIA官方驱动支持,导致无法直接安装。不过有些情况下可以通过PCI直通来使用物理GPU,但这需要宿主机支持,并且配置复杂[^3]。用户可能遇到了无法在虚拟机中安装驱动的情况,所以想看看WSL是否有替代方案。 接下来是WSL的情况。WSL2虽然运行在Hyper-V虚拟化层上,但它可以访问宿主机的GPU资源。微软和NVIDIA合作,为WSL提供了专用的驱动,用户需要在Windows主机上安装特定版本的NVIDIA驱动,这样WSL内的CUDA等工具就可以直接调用宿主机的GPU,而不需要在WSL内部安装驱动[^1]。这可能是用户的解决方案。 但用户提到的是在虚拟机无法安装驱动的情况下使用WSL,但WSL本身并不需要安装驱动,而是依赖宿主机的驱动。所以需要确认用户是否混淆了虚拟机和WSL的环境。另外,用户可能的问题在于如何在WSL使用GPU加速,比如运行CUDA程序,这时候确实不需要在WSL内部安装驱动,只要宿主机有正确的驱动即可。 还需要注意引用中的信息,比如WSL的内核问题[^2]。如果用户需要其他USB设备支持,可能需要自定义内核,但GPU驱动方面不需要,因为NVIDIA的驱动已经处理好了。此外,用户提到的Docker镜像可能涉及到CUDA环境的配置,这可能需要特定的镜像版本,比如sylvanding/my-cuda中的修复版本,这也和WSL使用CUDA有关联。 总结一下,用户的问题核心是当虚拟机无法安装NVIDIA驱动时,是否可以通过WSL使用GPU。答案应该是肯定的,因为WSL利用宿主机的驱动,不需要在内部安装。但需要指导用户正确配置宿主机驱动和WSL环境,确保CUDA工具链正确安装。同时,需要提醒用户,如果他们的虚拟机环境无法直通GPU,转而使用WSL可能是一个可行的替代方案。</think>### 在虚拟机无法安装NVIDIA驱动时,能否通过WSL安装? #### 1. **虚拟机与NVIDIA驱动的限制** - 虚拟机(如VMware、VirtualBox)默认使用虚拟化显卡,无法直接安装NVIDIA官方驱动。即使通过PCI直通技术访问物理GPU,也需要宿主机和虚拟化平台的支持,且配置复杂[^3]。 - 如果虚拟机环境不支持GPU直通,则无法直接使用NVIDIA GPU进行加速计算。 #### 2. **WSLGPU支持机制** - **WSL2**通过Hyper-V虚拟化层运行,但可以直接访问宿主机的GPU资源,无需在WSL内部安装NVIDIA驱动。需要满足以下条件: 1. **宿主机(Windows)安装专用驱动**:需安装NVIDIA为WSL设计的驱动(版本≥465.00)。 2. **WSL内配置CUDA工具链**:通过`apt`安装CUDA Toolkit,例如: ```bash sudo apt install nvidia-cuda-toolkit ``` 3. **验证GPU识别**:在WSL中运行`nvidia-smi`应显示与宿主机相同的GPU信息。 #### 3. **关键区别** - **虚拟机**:依赖虚拟化层和硬件兼容性,驱动安装受限。 - **WSL2**:通过宿主机驱动直接调用GPU,无需额外驱动安装,更适合深度学习/CUDA开发。 #### 4. **注意事项** - WSL2的USB设备支持有限(如摄像头需自定义内核编译),但GPU调用不受此限制。 - 若宿主机为NVIDIA GPU(如GeForce 3060Ti),推荐使用专用Docker镜像(如`sylvanding/my-cuda:11.3.1-cudnn8-devel-ubuntu20.04`)简化环境配置[^1]。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

shiter

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值