Ubuntu18.04下CUDA10.x和TensorFlow1.x环境搭建(2019.7重编版)

Mac和Ubuntu下修改pip源和TensorFlow(CPU)安装

前言

其实主要是CUDA的安装, 别的都很简单.


开发环境一览

  • CPU: Intel® Xeon® CPU E5-2696 v3
  • GPU: NVIDIA GTX 1080Ti
  • OS: Ubuntu 18.04.2 LTS 64位

用指令看下英伟达显卡:

lspci | grep -I nvidia

当你搭建完成环境之后, 可以用官方案例查看硬件信息, 我的GPU信息显示如下图, 这张表的参数在后续的并行算法设计中是很有用的:

image.png


显卡驱动安装

这步其实可以跳过, CUDA包里面带了驱动, 但是我还是给出流程. 注意禁用nouveau这一步还是要的.

最好不要用Ubuntu附加驱动里提供的显卡驱动.
可能会遇到一些奇怪的问题, 当然, 锦鲤是不会出问题的(手动滑稽).
这是第一个坑点, 大体有三种展现方式:

  • 装完重启进不去系统, 卡住ubuntu加载页面;
  • 无限登录;
  • 装好了, 进入了系统, 然后输入nvidia-smi指令没有任何反应. 正常情况会弹出一张表, 如下所示:

image


下载驱动

我的实操:
首先到官网下载显卡驱动, 比方说我是GTX 1080Ti, 操作系统是64位Linux, 我就找对应的版本进行下载.

image

删掉以往的驱动. 注意, 就算你啥都没装, 这步也是无害的.

sudo apt-get remove --purge nvidia*

安装驱动需要的库:

sudo apt-get update
sudo apt-get install dkms build-essential linux-headers-generic

驱动安装可能需要的库:

sudo dpkg --add-architecture i386
sudo apt update
sudo apt install libc6:i386

安装CUDA需要的库:

sudo apt-get install freeglut3-dev libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

禁用nouveau

打开blacklist.conf, 在最后加入禁用nouveau的设置, 这是一个开源驱动, 如

tensorflow1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.无mkl支持; 软硬件硬件环境Ubuntu16.04,GeForce GTX 1080 TI 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]://home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 译: bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
tensorflow1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
<think>我们正在处理一个复杂的跨平台迁移任务:将基于PyTorchCUDA的DPC算法迁移到华为CANN框架下的Ascend310P平台。原始环境是Windows+VS2019,目标环境Ubuntu18.04(通过Mobaxterm远程连接)华为CANN框架。迁移涉及C++代码(dpc.cpp)CUDA代码。迁移步骤:1.环境准备:-目标系统:Ubuntu18.04(官方要求CANN支持Ubuntu18.04或更高,需确认本兼容性)-安装CANN工具包(Ascend-cann-toolkit):根据华为官方文档安装,注意本匹配(如5.0.2.alpha005,具体本需查询Ascend310P的兼容本)-安装依赖:CMake(建议3.17以上)、GCC(要求7.3.0以上,CANN通常要求GCC7.3.0或9.3.0)、Python(通常3.7)等。2.代码迁移:-原始代码结构:包含C++文件(dpc.cpp)CUDA代码(.cu文件)。-由于目标平台是Ascend310P(使用华为自研的达芬奇架构),CUDA代码需要写为NPU支持的格式。华为提供了类似CUDA程接口:AscendCL(ACL)KernelLaunching(使用核函数开发)。但注意,CUDAAscend的架构不同,无法直接迁移。-替代方案:使用华为提供的计算引擎(如TBE算子开发)或使用框架内置算子写原算法。如果原算法使用了PyTorch的CUDA扩展,则需要在CANN框架下新实现。3.具体迁移步骤:a.将Windows下的代码通过Mobaxterm上传到Ubuntu服务器。b.修改代码:-移除CUDA相关代码(如`__global__`函数、cudaMemcpy等)。-使用AscendCL接口写计算部分。AscendCL提供了一套C/C++API,用于内存管理、任务调度计算。-如果原算法有现成的算子,可以调用CANN提供的算子库(如AICore算子库)。-如果必须自定义算子,则需要使用TBE(TensorBoostEngine)开发。TBE支持类似CUDA的核函数开发(使用Python+DSL),然后译成NPU可执行的二进制文件。c.修改构建系统:原项目可能是VisualStudio的解决方案,需要改为CMake。写CMakeLists.txt,链接CANN库(如libascendcl.so)。d.由于原始代码可能包含Windows特有的头文件函数,需要替换为Linux下的等效实现(如Windows线程API改为pthread)。4.示例代码迁移:假设原CUDA代码有一个向量加法:```cpp//CUDAkernel__global__voidvec_add(float*a,float*b,float*c,intn){inti=blockIdx.x*blockDim.x+threadIdx.x;if(i<n)c[i]=a[i]+b[i];}```在AscendCL中,我们需要:-分配设备内存(使用aclrtMalloc)。-将数据从主机复制到设备(aclrtMemcpy)。-使用核函数(需要开发一个TBE算子)或者使用AscendCL提供的内置算子(如使用单算子Add)来实现。由于AscendCL没有直接提供核函数程,我们需要将向量加法的实现封装成TBE算子,然后在C++代码中调用该算子。5.调试与测试:-使用Ascend提供的工具(如msprof)进行性能分析。-使用gdb调试C++代码(注意配置Ascend环境变量)。6.注意事项:-Ascend310P的内存管理(HostDevice)与CUDA类似,但API不同(aclrtMalloc,aclrtMemcpy等)。-计算单元调度:Ascend使用任务队列(Stream)事件(Event)机制,类似于CUDA,但API名称不同。-数据类型:注意Ascend平台支持的数据类型(如float16,float32,int8等),可能需要调整精度。7.环境配置:-安装CANN后,需要设置环境变量(通常通过`sourceset_env.sh`脚本设置)。-在CMakeLists.txt中,需要包含CANN的头文件路径库路径:```cmakeinclude_directories(${CANN_INC_PATH})link_directories(${CANN_LIB_PATH})target_link_libraries(your_targetascendcl)```8.跨平台问题:-文件路径:Windows使用反斜杠,Linux使用正斜杠,确保代码中路径处理正确。-译器差异:MSVCGCC在C++支持上有差异,可能需要调整代码(如数据类型、关键字等)。由于迁移工作量大且涉及底层硬件,建议先阅读华为的官方文档:-《AscendCANN应用软件开发指南》-《TBE算子开发指南》另外,华为提供了示例代码(如样例仓库中的samples),可以参考这些示例进行迁移。如果原DPC算法包含复杂的CUDA核函数,迁移工作将非常具有挑战性,可能需要新设计算法以利用Ascend的硬件特性。相关问题:1.如何安装CANN工具包到Ubuntu18.04?2.如何开发一个TBE算子并在C++代码中调用?3.在Ascend310P上如何调试C++代码?4.如何将PyTorch模型迁移到CANN框架?5.AscendCL与CUDA的API对应关系是怎样的?</think>### 迁移DPC算法到CANN框架的完整指南 将基于PyTorch/CUDA的DPC算法迁移到华为CANN框架的Ascend 310P平台,需完成以下核心步骤: #### 一、环境准备(Ubuntu 18.041. **安装CANN工具包** ```bash # 下载CANN Toolkit(示例本) wget https://ascend-repo.xxxx/cann_5.0.2.alpha005_linux-x86_64.run # 安装并设置环境变量 ./cann_5.0.2.alpha005_linux-x86_64.run --install source ~/Ascend/ascend-toolkit/set_env.sh ``` - 验证安装:`npuruntime --version` 2. **安装依赖库** ```bash sudo apt install cmake g++-7 python3-dev libeigen3-dev ``` #### 二、代码迁移关键步骤 1. **CUDA到AscendCL的写** - 内存管理替换: ```cpp // CUDA → AscendCL cudaMalloc(&dev_ptr, size) → aclrtMalloc(&dev_ptr, size, ACL_MEM_MALLOC_HUGE_FIRST) cudaMemcpy(dev_ptr, host_ptr, size, cudaMemcpyHostToDevice) → aclrtMemcpy(dev_ptr, size, host_ptr, size, ACL_MEMCPY_HOST_TO_DEVICE) ``` 2. **核函数迁移方案** - **方案1**:使用TBE算子开发(推荐) ```python # tbe_custom_op.py import tbe @tbe.register("dpc_kernel") def dpc_compute(inputs): # 实现DPC计算逻辑 return output ``` - **方案2**:调用内置算子(如AI Core算子) ```cpp aclopExecute("MatMul", ...); // 使用内置矩阵乘 ``` 3. **PyTorch接口替换** ```cpp // 原始PyTorch调用 torch::Tensor output = module.forward(input); // CANN替代方案 aclmdlDesc* modelDesc = aclmdlCreateDesc(); aclmdlLoadFromFile(modelDesc, "dpc_model.om"); aclmdlExecute(modelDesc, inputData, outputData); ``` #### 三、构建系统迁移(VS2019→CMake) ```cmake # CMakeLists.txt project(DPC_ASCEND) find_package(OpenMP REQUIRED) # 链接CANN库 include_directories($ENV{ASCEND_DIR}/runtime/include) link_directories($ENV{ASCEND_DIR}/runtime/lib64) add_executable(dpc_ascend dpc_ascend.cpp) target_link_libraries(dpc_ascend ascendcl openmp_core) ``` #### 四、兼容性处理 1. **数据类型转换表** | CUDA类型 | Ascend对应类型 | |----------------|----------------| | `float16` | `aclFloat16` | | `cudaStream_t` | `aclrtStream` | 2. **跨平台适配** ```cpp #ifdef _WIN32 // Windows特定代码 #else // Linux下使用clock_gettime struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); #endif ``` #### 五、验证与调试 1. **精度验证脚本** ```bash # 生成测试数据 python3 generate_test_data.py # 运行CPU/GPU/NPU结果比对 ./verify_results cpu_out.bin npu_out.bin ``` 2. **性能分析工具** ```bash msprof --application=./dpc_ascend --output=profiling_data ``` > **关键提示**: > 1. 优先使用华为提供的[算子适配工具](https://ascend.huawei.com/doc)进行自动化迁移 > 2. 对于复杂核函数,参考华为ModelZoo中的[DPC参考实现](https://gitee.com/ascend/modelzoo) > 3. 使用`aclrtSetDevice`显式指定计算设备[^1] --- ### 相关问题 1. 如何在Ascend 310P上实现CUDA(stream)的等效功能? 2. 迁移过程中如何处理Windows特有的线程API(如_beginthreadex)? 3. CANN框架下如何进行混合精度训练的性能优化? 4. 如何将PyTorch的自定义Autograd函数迁移到CANN? 5. 在Ascend平台调试内存溢出问题有哪些专用工具? [^1]: 华为CANN官方文档《AscendCL开发指南》第4.2节设备管理
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值