Magenta- 支持虚拟化

本文介绍了Magenta的虚拟化实现原理,详细讲解了如何利用Intel-VT技术创建虚拟环境,包括Hypervisor和Guest Context的建立过程,以及虚拟内存管理和虚拟中断处理等关键步骤。

Magenta- 支持虚拟化


Magneta实现了类似于Kvm+Qemu一套东西,但要简单得多,当前只支持在物理magenta上虚拟运行另一个magenta实例。

Mageta目前只支持x86,类似于Kvm,利用了x86的Intel-VT (Virtualization Technology)技术,在VMX root operation 和 VMX non-root operation间切换。类似于Qemu,Magenta也有user space的service,当前只支持I/O操作和空间访问异常操作。

Magenta提供了guest命令作为虚拟化运行的入口。命令的使用方式如下:

usage:guest kernel.bin [ramdisk.bin]
kernel.bin:   指magenta.bin文件
ramdisk.bin:  指ramdisk image文件

接下来我们看看命令guest的流程。


创建Hypervisor object

Magenta分别为Hypervisor和Guset创建对应的Context,以方便管理各自的状态和2者间的切换。

using HypervisorContext = VmxonContext;
using GuestContext = VmcsContext;

Hypervisor object实际指向的就是VmxonContext,这是个单实例。

  • VmxonContext为每个CPU分配一个VmxonPerCpu实例;
  • VmxonPerCpu做如下初始化:
    • 其会为本CPU分配1个page,作为将来指令vmxon的参数之用;
    • 每个CPU使能vmxon

至此,CPU进入了VMX root operation模式。


创建guest物理mem

在当前进程的地址空间上为guest创建物理mem空间,大小为1GB。

  • 创建1个大小为1GB的vmo;
  • 将此vmo映射至当前进程。在映射时,会分配物理页填满此1G空间。

创建guest object

guest的context是VmcsContext,所以其object指向此context。

  • 创建一个Fifo,作为将来kernel和user的交互通道;
  • 创建VmcsContext:
    • 为每个CPU分配VmcsPerCpu实例;
    • 基于前面所创建的guest物理mem空间,创建GuestPhysicalAddressSpace对象
      • 分配1个page,作为guest空间的页目录,为guest创建256G的虚拟地址空间;
      • 将 guest物理mem映射进guest地址空间,是以ExtendedPageTable(EPT)的方式映射;
    • 分配1个page,将其映射到guest空间的APIC_PHYS_BASE=0xfee00000处,后期在设置vmcs的APIC_ACCESS_ADDRESS时需要使用到此page;
    • 将IO-Apic物理基地址0xfec00000的映射从guest空间中去除,后期guest在访问IO-Apic时会trap进kernel,由kernel通过Fifo将消息传递至user,从而由user处理此事件;
    • 为每个CPU初始化VmcsPerCpu;
    • 配置vmcs,这部分太细,想了解的可以看手册《64-ia-32-architectures-software-developer-vol-3c-part-3-manual.pdf》。这里列几部分如下:
      • 配置EPT_POINTER指向前面所创建的guest页目录;
      • 配置HOST_RIP指向vmx_exit_entry,这是从guest从non-roo进入root模式的入口;
  • 在guest物理mem起始地址处创建页表,页表的覆盖地址范围是1GB;
  • 紧接着页表,创建acpi able;
  • 在物理mem偏移0x800000处创建bootdata参数,供guest os启动之用。启动参数包括:
    • ACPI table;
    • E820 table用于描述物理mem分布;
  • 将kernel.bin读取至物理mem的偏移0x100000处;解析kernel.bin的header,得到其entry函数地址;
  • 如果提供了ramdisk,则将其加载至bootdata之后;
  • 配置VmcsPerCpu的general purpose register (gpr),即配置guest的gpr。将寄存器rsi指向0x800000处,即bootdata处。这是因为Magenta启动时会默认认为rsi寄存器中保存的就是bootdata的地址;
  • 将kernel.bin的entry地址设置到guest的寄存器IP;
  • 设置guest cr3为0,这是因为页表位于guest物理mem的起始地址,而物理mem在guest空间是从0开始的。
  • 创建长度是1个page的vmo,让guest的Local-Apic即VIRTUAL_APIC_ADDRESS指向此vmo的物理页地址;并将此物理页也映射进本进程,以便后期user可以处理虚拟中断;

创建service线程

此线程监听Fifo消息,如果接收到kernel的消息,则解析消息并处理。当前只支持3种消息类型:

PORT_IN  
PORT_OUT
MEM_TRAP

当前MEM_TRAP只处理3中类型的trap:

Local-APIC访问异常
IO-APIC访问异常
TPM访问异常

vm enter

追后走至VmcsPerCpu的Enter函数:

  • 保存host现场,包括CR3,FS 的基地址等等;
  • 设置guest的CR3和IP至vmcs;
  • 调用函数vmx_enter
    • 保存host的gpr,载入guest的gpr(还记得之前设置的rsi吗);将vmx_enter的返回地址保存为host RIP,从而后期可正常返回;
    • 调用vmlaunch,进入non-root模式

vmx exit handler

vm的退出,并不是直接返回指令vmlaunch之后,而是返回至vmx_exit_entry:

  • 保存guest的gprs,并加载host的gprs;
  • 将返回地址host RIP压栈,以便可以ret返回;
  • 重新加载TSS,重新加载IDT。重新加载都是因为2者的描述符的VMX所设的limit默认值不合适;

做完以上的配置后,执行ret后就可返回至vmx_ente之后。然后处理退出事件,当前包括如下事件:

EXTERNAL_INTERRUPT          = 1u,
INTERRUPT_WINDOW            = 7u,
CPUID                       = 10u,
HLT                         = 12u,
VMCALL                      = 18u,
IO_INSTRUCTION              = 30u,
RDMSR                       = 31u,
WRMSR                       = 32u,
ENTRY_FAILURE_GUEST_STATE   = 33u,
ENTRY_FAILURE_MSR_LOADING   = 34u,
APIC_ACCESS                 = 44u,
EPT_VIOLATION               = 48u,
XSETBV                      = 55u,

关于访问Local-APIC和IO-APIC引起的vm exit,两者使用了不同的机制。Local-APIC是利用了VMX的固有功能从而触发了APIC_ACCESS:

44: APIC access. Guest software attempted to access memory at a physical address on the APIC-access page and the“virtualize APIC accesses” VM-execution control was 1 (see Section 29.4). 

而IO-APIC是利用了缺页中断的方式触发了EPT_VIOLATION。

这2种事件经过打包成package后,通过Fifo发送到user的service thread,并等待service的回应;service处理完后再通过Fifo发送结果至handler。

PowerShell 7 环境已加载 (版本: 7.5.2) PS C:\Users\Administrator\Desktop> cd E:\PyTorch_Build\pytorch PS E:\PyTorch_Build\pytorch> python -m venv rtx5070_env PS E:\PyTorch_Build\pytorch> .\rtx5070_env\Scripts\activate (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 1. 创建新脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\create_fixed_script.ps1 .\create_fixed_script.ps1: The term '.\create_fixed_script.ps1' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 2. 设置执行策略(仅当前会话) (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 3. 运行修正后的构建脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\build_safe_v2.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 4. 如果遇到问题,检查日志文件 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Get-ChildItem build_log_*.txt (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 修正构建脚本:build_safe_v2.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> $fixedScript = @' >> # ------ 环境初始化 ------ >> $ErrorActionPreference = "Stop" >> [Console]::OutputEncoding = [System.Text.Encoding]::Default >> >> # ------ 精确路径配置 ------ >> $pythonEnv = "E:\PyTorch_Build\pytorch\rtx5070_env" >> $cudaPath = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" >> $cudnnPath = "E:\Program Files\NVIDIA\CUNND\v9.12" >> >> # ------ 环境传感器函数 ------ >> function Test-CudaInstallation { >> param([string]$Path) >> >> $nvccPath = Join-Path $Path "bin\nvcc.exe" >> $libPath = Join-Path $Path "lib\x64\cudart.lib" >> >> if (-not (Test-Path $nvccPath)) { >> Write-Host "❌ 未找到nvcc.exe: $nvccPath" -ForegroundColor Red >> return $false >> } >> if (-not (Test-Path $libPath)) { >> Write-Host "❌ 未找到CUDA运行时库: $libPath" -ForegroundColor Red >> return $false >> } >> >> # 验证nvcc版本 >> $nvccVersion = & $nvccPath --version | Select-String "release (\d+\.\d+)" >> if ($nvccVersion.Matches.Groups[1].Value -ne "13.0") { >> Write-Host "❌ CUDA版本不匹配: 期望13.0,实际$($nvccVersion.Matches.Groups[1].Value)" >> return $false >> } >> >> return $true >> } >> >> function Test-CudnnInstallation { >> param([string]$Path) >> >> $dllPath = Join-Path $Path "bin\13.0\cudnn64_9.dll" >> $headerPath = Join-Path $Path "include\cudnn_version.h" >> >> if (-not (Test-Path $dllPath)) { >> Write-Host "❌ 未找到cuDNN DLL: $dllPath" -ForegroundColor Red >> return $false >> } >> if (-not (Test-Path $headerPath)) { >> Write-Host "❌ 未找到cuDNN头文件: $headerPath" -ForegroundColor Red >> return $false >> } >> >> # 验证cuDNN版本 >> $cudnnVersion = Get-Content $headerPath | >> Select-String '#define CUDNN_MAJOR (\d+)', '#define CUDNN_MINOR (\d+)', '#define CUDNN_PATCHLEVEL (\d+)' | >> ForEach-Object { $_.Matches.Groups[1].Value } >> >> if ($cudnnVersion -join "." -ne "9.1.2") { >> Write-Host "❌ cuDNN版本不匹配: 期望9.1.2,实际$($cudnnVersion -join '.')" >> return $false >> } >> >> return $true >> } >> >> # ------ 环境验证 ------ >> Write-Host "`n=== 验证CUDA 13.0安装 ===" -ForegroundColor Cyan >> if (-not (Test-CudaInstallation -Path $cudaPath)) { >> exit 1 >> } >> >> Write-Host "`n=== 验证cuDNN配置 ===" -ForegroundColor Cyan >> if (-not (Test-CudnnInstallation -Path $cudnnPath)) { >> exit 1 >> } >> >> # ------ 环境配置 ------ >> Write-Host "`n=== 配置构建环境 ===" -ForegroundColor Green >> $env:PATH = "$cudaPath\bin;$cudnnPath\bin\13.0;$env:PATH" >> $env:CUDA_PATH = $cudaPath >> $env:CUDNN_ROOT = $cudnnPath >> >> # 设置PyTorch构建参数 >> $env:USE_CUDA = "1" >> $env:USE_CUDNN = "1" >> $env:TORCH_CUDA_ARCH_LIST = "8.9" # RTX 5070的计算能力 >> >> # ------ 构建过程 ------ >> Write-Host "`n=== 激活Python环境 ===" -ForegroundColor Yellow >> & "$pythonEnv\Scripts\Activate.ps1" >> >> Write-Host "`n=== 清理构建缓存 ===" -ForegroundColor Yellow >> python setup.py clean >> Remove-Item -Recurse -Force build -ErrorAction SilentlyContinue >> Remove-Item -Recurse -Force dist -ErrorAction SilentlyContinue >> >> Write-Host "`n=== 安装构建依赖 ===" -ForegroundColor Yellow >> pip install -U pip setuptools wheel ninja >> pip install -r requirements.txt >> >> Write-Host "`n=== 开始PyTorch构建 (耗时操作) ===" -ForegroundColor Magenta >> $startTime = Get-Date >> python setup.py develop --cmake 2>&1 | Tee-Object -FilePath "build_log_$(Get-Date -Format yyyyMMdd_HHmmss).txt" >> $buildTime = (Get-Date) - $startTime >> >> # ------ 验证结果 ------ >> Write-Host "`n=== 验证构建结果 ===" -ForegroundColor Green >> python -c @" >> import torch >> print(f'PyTorch版本: {torch.__version__}') >> print(f'CUDA可用: {torch.cuda.is_available()}') >> if torch.cuda.is_available(): >> print(f'CUDA版本: {torch.version.cuda}') >> print(f'cuDNN版本: {torch.backends.cudnn.version()}') >> print(f'检测到的GPU: {torch.cuda.get_device_name(0)}') >> "@ >> >> # ------ 完成报告 ------ >> Write-Host "`n=== 构建报告 ===" -ForegroundColor Cyan >> Write-Host "构建时间: $($buildTime.ToString('hh\:mm\:ss'))" >> Write-Host "CUDA路径: $cudaPath" >> Write-Host "cuDNN路径: $cudnnPath" >> Write-Host "Python环境: $pythonEnv" >> '@ (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 保存修正后的脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-Content -Path "E:\PyTorch_Build\pytorch\build_safe_v2.ps1" -Value $fixedScript -Encoding UTF8 (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 1. 创建新脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\create_fixed_script.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 2. 设置执行策略(仅当前会话) (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 3. 运行修正后的构建脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\build_safe_v2.ps1 === 验证CUDA 13.0安装 === === 验证cuDNN配置 === ❌ 未找到cuDNN头文件: E:\Program Files\NVIDIA\CUNND\v9.12\include\cudnn_version.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 4. 如果遇到问题,检查日志文件 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Get-ChildItem build_log_*.txt (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 创建 create_fixed_script.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> @' >> # 构建安全脚本 v2 >> $scriptContent = @' >> # ------ 环境初始化 ------ >> $ErrorActionPreference = "Stop" >> >> # ------ 路径配置 ------ >> $pythonEnv = "$PSScriptRoot\rtx5070_env" >> $cudaPath = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" >> $cudnnPath = "E:\Program Files\NVIDIA\CUNND\v9.12" >> >> # ------ 环境验证 ------ >> if (-not (Test-Path "$cudaPath\bin\nvcc.exe")) { >> Write-Host "❌ CUDA路径无效: $cudaPath" -ForegroundColor Red >> exit 1 >> } >> if (-not (Test-Path "$cudnnPath\bin\13.0\cudnn64_9.dll")) { >> Write-Host "❌ cuDNN路径无效: $cudnnPath" -ForegroundColor Red >> exit 1 >> } >> >> # ------ 配置环境 ------ >> $env:PATH = "$cudaPath\bin;$cudnnPath\bin\13.0;$env:PATH" >> $env:CUDA_PATH = $cudaPath >> >> # ------ 构建过程 ------ >> & "$pythonEnv\Scripts\Activate.ps1" >> Write-Host "安装依赖..." -ForegroundColor Yellow >> pip install -U pip setuptools wheel >> pip install -r requirements.txt >> >> Write-Host "开始构建..." -ForegroundColor Green >> $startTime = Get-Date >> python setup.py develop 2>&1 | Tee-Object -FilePath "$PSScriptRoot\build_log.txt" >> $buildTime = (Get-Date) - $startTime >> >> # ------ 验证 ------ >> python -c "import torch; print(f'构建成功! PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}')" >> Write-Host "构建耗时: $($buildTime.ToString('hh\:mm\:ss'))" -ForegroundColor Cyan >> '@ | Set-Content -Path "$PSScriptRoot\build_safe_v2.ps1" -Encoding UTF8 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "✅ 脚本 build_safe_v2.ps1 已创建" -ForegroundColor Green ✅ 脚本 build_safe_v2.ps1 已创建 (rtx5070_env) PS E:\PyTorch_Build\pytorch> '@ | Set-Content -Path "create_fixed_script.ps1" -Encoding UTF8 >> >> # 创建运行脚本 run_build.ps1 >> @' @ | Set-Content -Path "create_fixed_script.ps1" -Encoding UTF8 # 创建运行脚本 run_build.ps1 @ (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\create_fixed_script.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\build_safe_v2.ps1 === 验证CUDA 13.0安装 === === 验证cuDNN配置 === ❌ 未找到cuDNN头文件: E:\Program Files\NVIDIA\CUNND\v9.12\include\cudnn_version.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> '@ | Set-Content -Path "run_build.ps1" -Encoding UTF8 >> >> Write-Host "✅ 所有脚本已创建" -ForegroundColor Green >> Write-Host "1. 创建构建脚本: create_fixed_script.ps1" >> Write-Host "2. 运行脚本: run_build.ps1" >> # 1. 创建所有必要脚本 >> .\create_fixed_script.ps1 >> >> # 2. 运行完整构建流程 >> .\run_build.ps1 >> >> # 创建并立即运行最终构建脚本 >> @' @ | Set-Content -Path "run_build.ps1" -Encoding UTF8 Write-Host "✅ 所有脚本已创建" -ForegroundColor Green Write-Host "1. 创建构建脚本: create_fixed_script.ps1" Write-Host "2. 运行脚本: run_build.ps1" # 1. 创建所有必要脚本 .\create_fixed_script.ps1 # 2. 运行完整构建流程 .\run_build.ps1 # 创建并立即运行最终构建脚本 @ (rtx5070_env) PS E:\PyTorch_Build\pytorch> # ------ 初始化 ------ (rtx5070_env) PS E:\PyTorch_Build\pytorch> $ErrorActionPreference = "Stop" (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "=== PyTorch 构建脚本 RTX5070 ===" -ForegroundColor Cyan === PyTorch 构建脚本 RTX5070 === (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "开始时间: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')" 开始时间: 2025-09-02 23:50:24 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # ------ 路径配置 ------ (rtx5070_env) PS E:\PyTorch_Build\pytorch> $pythonEnv = "$PSScriptRoot\rtx5070_env" (rtx5070_env) PS E:\PyTorch_Build\pytorch> $cudaPath = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" (rtx5070_env) PS E:\PyTorch_Build\pytorch> $cudnnPath = "E:\Program Files\NVIDIA\CUNND\v9.12" (rtx5070_env) PS E:\PyTorch_Build\pytorch> $logFile = "$PSScriptRoot\build_log_$(Get-Date -Format 'yyyyMMdd_HHmmss').txt" (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # ------ 环境验证 ------ (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "验证CUDA安装..." -ForegroundColor Yellow 验证CUDA安装... (rtx5070_env) PS E:\PyTorch_Build\pytorch> if (-not (Test-Path "$cudaPath\bin\nvcc.exe")) { >> Write-Host "❌ 错误: CUDA路径无效 - $cudaPath" -ForegroundColor Red >> exit 1 >> } (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "验证cuDNN安装..." -ForegroundColor Yellow 验证cuDNN安装... (rtx5070_env) PS E:\PyTorch_Build\pytorch> if (-not (Test-Path "$cudnnPath\bin\13.0\cudnn64_9.dll")) { >> Write-Host "❌ 错误: cuDNN路径无效 - $cudnnPath" -ForegroundColor Red >> exit 1 >> } (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # ------ 环境配置 ------ (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "配置环境变量..." -ForegroundColor Green 配置环境变量... (rtx5070_env) PS E:\PyTorch_Build\pytorch> $env:PATH = "$cudaPath\bin;$cudnnPath\bin\13.0;$env:PATH" (rtx5070_env) PS E:\PyTorch_Build\pytorch> $env:CUDA_PATH = $cudaPath (rtx5070_env) PS E:\PyTorch_Build\pytorch> $env:TORCH_CUDA_ARCH_LIST = "8.9" # RTX 5070 架构 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # ------ 激活Python环境 ------ (rtx5070_env) PS E:\PyTorch_Build\pytorch> Write-Host "激活Python虚拟环境..." -ForegroundColor Magenta 激活Python虚拟环境... (rtx5070_env) PS E:\PyTorch_Build\pytorch> & "$pythonEnv\Scripts\Activate.ps1" &: The term '\rtx5070_env\Scripts\Activate.ps1' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. (rtx5070_env) PS E:\PyTorch_Build\pytorch>
09-03
PowerShell 7 环境已加载 (版本: 7.5.2) PS C:\Users\Administrator\Desktop> cd E:\PyTorch_Build\pytorch PS E:\PyTorch_Build\pytorch> python -m venv rtx5070_env Error: [Errno 13] Permission denied: 'E:\\PyTorch_Build\\pytorch\\rtx5070_env\\Scripts\\python.exe' PS E:\PyTorch_Build\pytorch> .\rtx5070_env\Scripts\activate (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 创建最终修复脚本: build_final_fix.ps1 (rtx5070_env) PS E:\PyTorch_Build\pytorch> @' >> # ------ 初始化 ------ >> $ErrorActionPreference = "Stop" >> Write-Host "=== PyTorch 最终修复构建脚本 ===" -ForegroundColor Cyan >> Write-Host "开始时间: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')" >> >> # ------ 路径配置 ------ >> $pythonEnv = "E:\PyTorch_Build\pytorch\rtx5070_env" >> $cudaPath = "E:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0" >> $cudnnPath = "E:\Program Files\NVIDIA\CUNND\v9.12" >> $logFile = "$PWD\build_log_$(Get-Date -Format 'yyyyMMdd_HHmmss').txt" >> >> # ------ 修复 cuDNN 头文件路径问题 ------ >> Write-Host "`n=== 修复 cuDNN 配置 ===" -ForegroundColor Yellow >> $cudnnHeaderPath = Join-Path $cudnnPath "include" >> >> # 验证头文件是否存在 >> if (-not (Test-Path (Join-Path $cudnnHeaderPath "cudnn_version.h"))) { >> # 检查备用路径结构 >> $altHeaderPath = Join-Path $cudnnPath "include\cudnn_version.h" >> if (Test-Path $altHeaderPath) { >> $cudnnHeaderPath = $cudnnPath >> Write-Host "✅ 使用备用头文件路径: $cudnnHeaderPath" -ForegroundColor Green >> } else { >> Write-Host "❌ 错误: 无法找到 cuDNN 头文件" -ForegroundColor Red >> Write-Host "请检查以下路径:" >> Write-Host "1. $cudnnPath\include\cudnn_version.h" >> Write-Host "2. $cudnnPath\cudnn_version.h" >> exit 1 >> } >> } else { >> Write-Host "✅ cuDNN 头文件验证通过: $cudnnHeaderPath" -ForegroundColor Green >> } >> >> # ------ 环境配置 ------ >> Write-Host "`n=== 配置环境变量 ===" -ForegroundColor Green >> # CUDA 路径 >> $env:CUDA_PATH = $cudaPath >> # cuDNN 路径(修复后的头文件路径) >> $env:CUDNN_INCLUDE_DIR = $cudnnHeaderPath >> $env:CUDNN_LIB_DIR = "$cudnnPath\bin\13.0" >> # GPU 架构 >> $env:TORCH_CUDA_ARCH_LIST = "8.9" # RTX 5070 架构 >> >> # 更新 PATH >> $env:PATH = "$cudaPath\bin;$env:CUDNN_LIB_DIR;$env:PATH" >> >> # ------ 修复虚拟环境激活问题 ------ >> Write-Host "`n=== 激活 Python 环境 ===" -ForegroundColor Magenta >> $activateScript = "$pythonEnv\Scripts\Activate.ps1" >> if (-not (Test-Path $activateScript)) { >> Write-Host "❌ 错误: 虚拟环境激活脚本未找到" -ForegroundColor Red >> Write-Host "请检查路径: $activateScript" >> exit 1 >> } >> >> # 使用 . 命令激活环境 >> . $activateScript >> Write-Host "✅ Python 虚拟环境已激活: $pythonEnv" -ForegroundColor Green >> >> # ------ 构建过程 ------ >> Write-Host "`n=== 开始 PyTorch 构建 ===" -ForegroundColor Green >> $startTime = Get-Date >> >> # 安装必要依赖 >> pip install -U pip setuptools wheel ninja >> pip install -r requirements.txt >> >> # 执行构建并记录日志 >> python setup.py develop --cmake 2>&1 | Tee-Object -FilePath $logFile >> $buildTime = (Get-Date) - $startTime >> >> # ------ 验证结果 ------ >> Write-Host "`n=== 验证构建结果 ===" -ForegroundColor Cyan >> python -c @" >> import torch >> print(f'PyTorch版本: {torch.__version__}') >> print(f'CUDA可用: {torch.cuda.is_available()}') >> if torch.cuda.is_available(): >> print(f'CUDA版本: {torch.version.cuda}') >> print(f'cuDNN版本: {torch.backends.cudnn.version()}') >> print(f'检测到的GPU: {torch.cuda.get_device_name(0)}') >> "@ >> >> # ------ 完成报告 ------ >> Write-Host "`n=== 构建报告 ===" -ForegroundColor Cyan >> Write-Host "构建状态: $(if ($LASTEXITCODE -eq 0) {'成功!'} else {'失败'})" -ForegroundColor $(if ($LASTEXITCODE -eq 0) {'Green'} else {'Red'}) >> Write-Host "总耗时: $($buildTime.ToString('hh\:mm\:ss'))" >> Write-Host "日志文件: $logFile" >> Write-Host "完成时间: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')" >> >> # 返回退出代码 >> exit $LASTEXITCODE >> '@ | Set-Content -Path "build_final_fix.ps1" -Encoding UTF8 (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 运行修复后的构建脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\build_final_fix.ps1 === PyTorch 最终修复构建脚本 === 开始时间: 2025-09-02 23:55:28 === 修复 cuDNN 配置 === ❌ 错误: 无法找到 cuDNN 头文件 请检查以下路径: 1. E:\Program Files\NVIDIA\CUNND\v9.12\include\cudnn_version.h 2. E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_version.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 1. 创建修复脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-Content -Path "create_final_script.ps1" -Value $finalScript -Encoding UTF8 InvalidOperation: The variable '$finalScript' cannot be retrieved because it has not been set. (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 2. 运行构建脚本 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass -Force (rtx5070_env) PS E:\PyTorch_Build\pytorch> .\build_final_fix.ps1 === PyTorch 最终修复构建脚本 === 开始时间: 2025-09-02 23:56:18 === 修复 cuDNN 配置 === ❌ 错误: 无法找到 cuDNN 头文件 请检查以下路径: 1. E:\Program Files\NVIDIA\CUNND\v9.12\include\cudnn_version.h 2. E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_version.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 3. 检查构建结果 (rtx5070_env) PS E:\PyTorch_Build\pytorch> if ($LASTEXITCODE -ne 0) { >> Get-Content $logFile | Select-Object -Last 50 >> } InvalidOperation: Line | 2 | Get-Content $logFile | Select-Object -Last 50 | ~~~~~~~~ | The variable '$logFile' cannot be retrieved because it has not been set. (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 检查 cuDNN 安装结构 (rtx5070_env) PS E:\PyTorch_Build\pytorch> Get-ChildItem -Path "E:\Program Files\NVIDIA\CUNND\v9.12" -Recurse | >> Where-Object { $_.Name -like "*cudnn*" } | >> Format-Table FullName FullName -------- E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_samples E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_adv64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_cnn64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_engines_precompiled64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_engines_runtime_compiled64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_graph64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_heuristic64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn_ops64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\12.9\cudnn64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_adv64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_cnn64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_engines_precompiled64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_engines_runtime_compiled64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_graph64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_heuristic64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn_ops64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\bin\13.0\cudnn64_9.dll E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_samples\mnistCUDNN E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_samples\cmake\FindcuDNN.cmake E:\Program Files\NVIDIA\CUNND\v9.12\cudnn_samples\mnistCUDNN\mnistCUDNN.cpp E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_adv.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_backend.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_cnn.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_graph.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_ops.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn_version.h E:\Program Files\NVIDIA\CUNND\v9.12\include\12.9\cudnn.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_adv.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_backend.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_cnn.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_graph.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_ops.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn_version.h E:\Program Files\NVIDIA\CUNND\v9.12\include\13.0\cudnn.h E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_adv.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_adv64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_cnn.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_cnn64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_engines_precompiled.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_engines_precompiled64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_engines_runtime_compiled.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_engines_runtime_compiled64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_graph.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_graph64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_heuristic.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_heuristic64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_ops.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn_ops64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\12.9\x64\cudnn64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_adv.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_adv64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_cnn.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_cnn64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_engines_precompiled.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_engines_precompiled64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_engines_runtime_compiled.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_engines_runtime_compiled64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_graph.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_graph64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_heuristic.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_heuristic64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_ops.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn_ops64_9.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn.lib E:\Program Files\NVIDIA\CUNND\v9.12\lib\13.0\x64\cudnn64_9.lib (rtx5070_env) PS E:\PyTorch_Build\pytorch> (rtx5070_env) PS E:\PyTorch_Build\pytorch> # 预期结构应包含: (rtx5070_env) PS E:\PyTorch_Build\pytorch> # bin\13.0\cudnn64_9.dll (rtx5070_env) PS E:\PyTorch_Build\pytorch> # include\cudnn_version.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> # include\cudnn.h (rtx5070_env) PS E:\PyTorch_Build\pytorch> # lib\x64\cudnn.lib (rtx5070_env) PS E:\PyTorch_Build\pytorch>
最新发布
09-03
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值