[root@gpu27 nvidia]# nvidia-smi  

Unable to determine the device handle for GPU 0000:02:00.0: GPU is lost.  Reboot the system to recover this GPU 。

Unable to determine the device handle for GPU 0000:02:00.0: GPU is lost.  Reboot the system to recov_服务器

排查:应该是重新安装服务的时候损坏了nvidia这个服务,所有我决定重新安装nvidia服务,准备相关安装脚本:

Unable to determine the device handle for GPU 0000:02:00.0: GPU is lost.  Reboot the system to recov_服务器_02