OpenStack Queens的BUG

本文记录了安装和配置完成后访问dashboard web界面时遇到的500错误,并提供了详细的错误日志。错误主要集中在Apache服务器运行过程中出现的超时、输出头部前脚本超时等问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

安装和配置完dashboard后,访问web界面,报500错误。错误日志如下:

[Mon Mar 26 07:58:09.8131612018] [suexec:notice] [pid 2156] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Mon Mar 26 07:58:09.8440722018] [auth_digest:notice] [pid 2156] AH01757: generating secret fordigest authentication ...

[Mon Mar 26 07:58:09.8453052018] [lbmethod_heartbeat:notice] [pid 2156] AH02282: No slotmem frommod_heartmonitor

[Mon Mar 26 07:58:09.8728302018] [mpm_prefork:notice] [pid 2156] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Mon Mar 26 07:58:09.8728862018] [core:notice] [pid 2156] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Mon Mar 26 10:04:38.8710072018] [mpm_prefork:notice] [pid 2156] AH00170: caught SIGWINCH,shutting down gracefully

[Mon Mar 26 16:42:21.1551822018] [suexec:notice] [pid 1052] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Mon Mar 26 16:42:21.2725452018] [auth_digest:notice] [pid 1052] AH01757: generating secret fordigest authentication ...

[Mon Mar 26 16:42:21.2734442018] [lbmethod_heartbeat:notice] [pid 1052] AH02282: No slotmem frommod_heartmonitor

[Mon Mar 26 16:42:21.4131612018] [mpm_prefork:notice] [pid 1052] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Mon Mar 26 16:42:21.4132672018] [core:notice] [pid 1052] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Mon Mar 26 19:55:53.6234902018] [mpm_prefork:notice] [pid 1052] AH00170: caught SIGWINCH,shutting down gracefully

[Mon Mar 26 19:55:58.7265702018] [suexec:notice] [pid 8110] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Mon Mar 26 19:55:58.7727992018] [auth_digest:notice] [pid 8110] AH01757: generating secret fordigest authentication ...

[Mon Mar 26 19:55:58.7738022018] [lbmethod_heartbeat:notice] [pid 8110] AH02282: No slotmem frommod_heartmonitor

[Mon Mar 26 19:55:59.6016272018] [mpm_prefork:notice] [pid 8110] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Mon Mar 26 19:55:59.6017512018] [core:notice] [pid 8110] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Mon Mar 26 22:24:19.7455842018] [mpm_prefork:notice] [pid 8110] AH00170: caught SIGWINCH,shutting down gracefully

[Tue Mar 27 03:29:03.8133582018] [suexec:notice] [pid 1082] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Tue Mar 27 03:29:03.9079612018] [auth_digest:notice] [pid 1082] AH01757: generating secret fordigest authentication ...

[Tue Mar 27 03:29:03.9089872018] [lbmethod_heartbeat:notice] [pid 1082] AH02282: No slotmem frommod_heartmonitor

[Tue Mar 27 03:29:04.5779992018] [mpm_prefork:notice] [pid 1082] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Tue Mar 27 03:29:04.5781712018] [core:notice] [pid 1082] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Tue Mar 27 09:24:33.1841182018] [mpm_prefork:notice] [pid 1082] AH00170: caught SIGWINCH,shutting down gracefully

[Tue Mar 27 19:15:06.0744422018] [suexec:notice] [pid 1065] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Tue Mar 27 19:15:06.3194512018] [auth_digest:notice] [pid 1065] AH01757: generating secret fordigest authentication ...

[Tue Mar 27 19:15:06.3243402018] [lbmethod_heartbeat:notice] [pid 1065] AH02282: No slotmem frommod_heartmonitor

[Tue Mar 27 19:15:07.1633762018] [mpm_prefork:notice] [pid 1065] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Tue Mar 27 19:15:07.1635442018] [core:notice] [pid 1065] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Tue Mar 27 20:06:33.8932762018] [mpm_prefork:notice] [pid 1065] AH00170: caught SIGWINCH,shutting down gracefully

[Tue Mar 27 20:08:20.5030082018] [suexec:notice] [pid 25300] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Tue Mar 27 20:08:20.5487892018] [auth_digest:notice] [pid 25300] AH01757: generating secret fordigest authentication ...

[Tue Mar 27 20:08:20.5496672018] [lbmethod_heartbeat:notice] [pid 25300] AH02282: No slotmemfrom mod_heartmonitor

[Tue Mar 27 20:08:20.6798212018] [mpm_prefork:notice] [pid 25300] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Tue Mar 27 20:08:20.6798952018] [core:notice] [pid 25300] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Tue Mar 27 20:09:32.1591342018] [core:error] [pid 25319] [client 192.168.226.1:50607] Scripttimed out before returning headers: django.wsgi

[Tue Mar 27 20:10:38.7900372018] [core:error] [pid 25321] [client 192.168.226.139:39644] Scripttimed out before returning headers: django.wsgi

[Tue Mar 27 20:12:12.2894762018] [core:error] [pid 25673] [client 192.168.226.139:39648] Scripttimed out before returning headers: django.wsgi

[Tue Mar 27 20:13:37.1481482018] [core:error] [pid 25317] [client 192.168.226.1:50606] End ofscript output before headers: django.wsgi

[Tue Mar 27 20:28:06.4293342018] [core:error] [pid 25319] [client 192.168.226.1:51134] Scripttimed out before returning headers: django.wsgi

[Tue Mar 27 20:29:55.7591202018] [core:error] [pid 25318] [client 192.168.226.139:39672] End ofscript output before headers: django.wsgi

[Tue Mar 27 20:33:14.9503262018] [mpm_prefork:notice] [pid 25300] AH00170: caught SIGWINCH,shutting down gracefully

[Tue Mar 27 22:16:43.1808002018] [suexec:notice] [pid 2631] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Tue Mar 27 22:16:43.2372062018] [auth_digest:notice] [pid 2631] AH01757: generating secret fordigest authentication ...

[Tue Mar 27 22:16:43.2380532018] [lbmethod_heartbeat:notice] [pid 2631] AH02282: No slotmem frommod_heartmonitor

[Tue Mar 27 22:16:43.3634132018] [mpm_prefork:notice] [pid 2631] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Tue Mar 27 22:16:43.3634652018] [core:notice] [pid 2631] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Tue Mar 27 22:46:02.5886692018] [core:error] [pid 2663] [client 192.168.226.1:49610] End ofscript output before headers: django.wsgi

[Tue Mar 27 22:57:18.2131022018] [autoindex:error] [pid 13192] [client 192.168.226.1:49979]AH01276: Cannot serve directory/var/www/html/: No matching DirectoryIndex (index.html) found, andserver-generated directory index forbidden by Options directive

[Tue Mar 27 23:00:46.6731852018] [core:error] [pid 2665] [client 192.168.226.1:49923] End ofscript output before headers: django.wsgi

[Wed Mar 28 03:30:43.5835582018] [mpm_prefork:notice] [pid 2631] AH00170: caught SIGWINCH,shutting down gracefully

[Wed Mar 28 08:06:29.4096992018] [suexec:notice] [pid 2564] AH01232: suEXEC mechanism enabled(wrapper: /usr/sbin/suexec)

[Wed Mar 28 08:06:29.4532502018] [auth_digest:notice] [pid 2564] AH01757: generating secret fordigest authentication ...

[Wed Mar 28 08:06:29.4656072018] [lbmethod_heartbeat:notice] [pid 2564] AH02282: No slotmem frommod_heartmonitor

[Wed Mar 28 08:06:29.6347202018] [mpm_prefork:notice] [pid 2564] AH00163: Apache/2.4.6 (CentOS)mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations

[Wed Mar 28 08:06:29.6347712018] [core:notice] [pid 2564] AH00094: Command line:'/usr/sbin/httpd -D FOREGROUND'

[Wed Mar 28 08:16:04.1072232018] [core:error] [pid 2600] [client 192.168.226.1:51750] End ofscript output before headers: django.wsgi


这是个BUG,上个版本Pike就有。这个版本涛声依旧。



<think>嗯,用户问的是关于OpenStack中将GPU直通给虚拟机,然后在删除虚拟机后GPU自动解除占用的方法。我需要先回忆一下OpenStack里GPU直通的相关配置。 首先,GPU直通通常用的是PCI passthrough或者vGPU技术。用户这里应该指的是PCI直通,也就是把整个物理GPU设备直接分配给虚拟机,这样虚拟机删除后资源应该被释放。但有时候可能因为配置问题或者资源管理的原因,GPU没有被正确释放。 接下来,我得考虑OpenStack中实现PCI直通的基本步骤。需要确认Nova和Neutron的配置是否正确,比如在nova.conf中启用PCI passthrough,指定设备的vendor_id和product_id。然后,在flavor里设置对应的PCI设备需求,这样虚拟机创建时会绑定GPU。 当用户删除虚拟机后,按理说Nova应该自动释放所有关联的资源,包括PCI设备。但如果GPU没有被释放,可能的原因有哪些呢?可能是资源跟踪的问题,比如Nova没有正确更新PCI设备的状态,或者有残留的实例信息。另外,检查hypervisor层面,比如Libvirt和KVM的配置是否正确,是否有进程占用设备导致无法释放。 然后,解决步骤应该包括检查日志,比如Nova的日志,查看删除虚拟机时的操作记录。还有用nova pci-list命令查看PCI设备的状态是否为available。如果还是被占用,可能需要手动清理,比如重启相关服务,或者使用virsh命令清理残留的虚拟机定义。 另外,用户可能需要自动化解除占用,所以需要确认OpenStack版本是否有已知的bug,或者是否需要额外的配置确保资源释放。比如,有些情况下需要配置Nova的释放策略,或者使用定期任务来同步PCI设备状态。 可能还需要提醒用户关于NUMA亲和性或者IOMMU组的配置是否正确,这些也会影响设备的分配和释放。最后,总结一下关键步骤和常见问题,帮助用户系统地排查问题。</think>好的,我将分步骤解释OpenStack中GPU直通机制及虚拟机删除后GPU释放的逻辑。 #### 1. GPU直通技术原理 在OpenStack中,GPU直通通常使用**PCI Passthrough**或**vGPU**方案。这里讨论的是PCI直通(硬件直通)场景: - **底层技术**:依赖Intel VT-d/AMD-Vi的IOMMU硬件虚拟化支持 - **实现方式**:将物理GPU作为PCI设备直接映射到虚拟机 - **特点**:虚拟机独占GPU,性能接近物理机,但无法共享 #### 2. OpenStack配置关键步骤 ##### 2.1 Nova配置(以KVM为例) ```bash # 在nova.conf中启用PCI设备 [pci] passthrough_whitelist = { "vendor_id": "10de", "product_id": "1eb8" } # NVIDIA Tesla T4示例 ``` ##### 2.2 创建可用性标记 ```bash openstack flavor set m1.large.gpu \ --property "pci_passthrough:alias"="gpu:1" # 标记需要1个GPU设备 ``` ##### 2.3 虚拟机启动时绑定 通过`nova boot`命令或Horizon界面创建实例时,调度器会自动匹配可用GPU节点。 #### 3. 资源释放机制 ##### 3.1 正常释放流程 当删除虚拟机时: ``` 1. Nova收到删除请求 2. 调用Libvirt销毁实例 3. Libvirt解除PCI设备绑定 4. Nova更新PCI设备池状态为available ``` ##### 3.2 异常未释放的排查方法 ```bash # 查看PCI设备状态 openstack hypervisor pci list <hypervisor-hostname> # 检查Libvirt残留定义 virsh list --all | grep <instance-id> # 强制释放命令(谨慎使用) virsh nodedev-detach pci_0000_08_00_0 # 需替换为实际设备地址 ``` #### 4. 常见问题与解决方案 | 现象 | 原因 | 解决方法 | |------|------|----------| | GPU仍显示占用 | Nova数据库状态不同步 | 执行`nova-manage pci_whitelist --sync` | | 设备未解除映射 | Libvirt未正确清理 | 重启`nova-compute`服务 | | 物理GPU驱动异常 | 设备被多次绑定 | 重启物理机恢复IOMMU状态 | #### 5. 最佳实践建议 1. **版本要求**:确保使用OpenStack >= Queens版本(改进PCI资源跟踪) 2. **监控配置**:启用Prometheus监控`nova_pci_devices`指标 3. **自动化脚本**:定期清理残留设备 ```bash #!/bin/bash for dev in $(virsh nodedev-list | grep pci); do if [ $(virsh nodedev-dumpxml $dev | grep -c "no device") -eq 1 ]; then virsh nodedev-reattach $dev fi done ``` #### 6. 高级调试方法 检查内核日志确认设备释放: ```bash dmesg | grep -i vfio # 正常释放应显示类似: # vfio-pci 0000:08:00.0: Releasing device ``` 通过以上机制,OpenStack在正常操作下可实现GPU资源的自动回收。若遇异常状态,需结合日志分析和手动干预确保资源释放。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值