一、问题描述
运行xorg后,然后再运行glmark2,没有图像,且过一会儿kernel崩溃。
二、问题分析
2.1 日志分析
查看/var/crash/下的崩溃日志如下:
2776 [ 203.831153] BUG: kernel NULL pointer dereference, address: 0000000000000120
2777 [ 203.831154] #PF: supervisor read access in kernel mode
2778 [ 203.831154] #PF: error_code(0x0000) - not-present page
2779 [ 203.831155] PGD 0 P4D 0
2780 [ 203.831157] pvr 0000:05:00.0: tgid=2607, tgid_connection=2607, bridge_id=130, func_id=2
2781 [ 203.831158] Oops: 0000 [#1] SMP NOPTI
2782 [ 203.831159] CPU: 0 PID: 2511 Comm: Xorg Kdump: loaded Tainted: G W OE 5.4.0-42-generic #46~18.04.1-Ubuntu
2783 [ 203.831159] Hardware name: ASUS System Product Name/PRIME Z590-P, BIOS 1017 07/12/2021
2784 [ 203.831172] RIP: 0010:pvr_query_pmr_info+0x31/0x210 [xdxgpu]
2785 [ 203.831176] [drm:drm_ioctl [drm]] pid=2607, dev=0xe280, auth=1, PVR_SRVKM_CMD
2786 [ 203.831177] pvr 0000:05:00.0: tgid=2607, tgid_connection=2607, bridge_id=19, func_id=0
2787 [ 203.831178] Code: 89 e5 41 56 41 55 41 54 53 49 89 fc 49 89 d5 48 89 f3 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 2f 63 f8 ff <48> 8b 80 20 01 00 00 48 8d 75 d0 4c 89 e7 4c 8b 30 e8 09 66 f8 ff
2788 [ 203.831179] RSP: 0018:ffffb3ac80b5bbe0 EFLAGS: 00010286
2789 [ 203.831180] RAX: 0000000000000000 RBX: ffffb3ac80b5bc48 RCX: 0000000000000006
2790 [ 203.831180] RDX: ffffb3ac80b5bc34 RSI: ffffb3ac80b5bc48 RDI: ffff994bf505dac0
2791 [ 203.831181] RBP: ffffb3ac80b5bc18 R08: 0000000000001aa1 R09: 0000000000000004
2792 [ 203.831181] R10: 0000000000000005 R11: 0000000000000001 R12: ffff994beeeefdb0
2793 [ 203.831181] R13: ffffb3ac80b5bc34 R14: ffffb3ac80b5bd68 R15: ffff994bf31fc000
2794 [ 203.831182] FS: 00007fe0ded188c0(0000) GS:ffff994bff200000(0000) knlGS:0000000000000000
2795 [ 203.831183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2796 [ 203.831183] CR2: 0000000000000120 CR3: 0000000830cf0005 CR4: 0000000000760ef0
2797 [ 203.831184] PKRU: 55555554
2798 [ 203.831184] Call Trace:
堆栈省略
.......
从崩溃日志可知,是进程Xorg访问了NULL指针指向的内存数据,即地址0000000000000120指向的地址的数据。
2.2 coredump文件分析
crash工具加载debug版本的vmlinux和coredump文件,得到kernel crash时的如下调用堆栈,从调用堆栈可知,语句pvr_query_pmr_info+49触发了缺页异常,并最终导致了kernel crash。:
crash> bt
PID: 2511 TASK: ffff994bf505dac0 CPU: 0 COMMAND: "Xorg"
#0 [ffffb3ac80b5b858] machine_kexec at ffffffff9dc6f773
#1 [ffffb3ac80b5b8b8] __crash_kexec at ffffffff9dd573a2
#2 [ffffb3ac80b5b988] crash_kexec at ffffffff9dd58241
#3 [ffffb3ac80b5b9a8] oops_end at ffffffff9dc3557d
#4 [ffffb3ac80b5b9d0] no_context at ffffffff9dc7f619
#5 [ffffb3ac80b5ba40] __bad_area_nosemaphore at ffffffff9dc7fa10
#6 [ffffb3ac80b5ba88] bad_area_nosemaphore at ffffffff9dc7fbc6
#7 [ffffb3ac80b5ba98] __do_page_fault at ffffffff9dc8058d
#8 [ffffb3ac80b5bb00] do_page_fault at ffffffff9dc8087c
#9 [ffffb3ac80b5bb30] page_fault at ffffffff9e801284
[exception RIP: pvr_query_pmr_info+49]
RIP: ffffffffc09ba6f1 RSP: ffffb3ac80b5bbe0 RFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffb3ac80b5bc48 RCX: 0000000000000006
RDX: ffffb3ac80b5bc34 RSI: ffffb3ac80b5bc48 RDI: ffff994bf505dac0
RBP: ffffb3ac80b5bc18 R8: 0000000000001aa1 R9: 0000000000000004
R10: 0000000000000005 R11: 0000000000000001 R12: ffff994beeeefdb0
R13: ffffb3ac80b5bc34 R14: ffffb3ac80b5bd68 R15: ffff994bf31fc000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffffb3ac80b5bc20] xdx_gem_prime_import at ffffffffc09d0280 [xdxgpu]
#11 [ffffb3ac80b5bc78] drm_gem_prime_fd_to_handle at ffffffffc075f34c [drm]
#12 [ffffb3ac80b5bcb8] drm_prime_fd_to_handle_ioctl at ffffffffc075fa57 [drm]
#13 [ffffb3ac80b5bcc8] drm_ioctl_kernel at ffffffffc0751870 [drm]
#14 [ffffb3ac80b5bd18] drm_ioctl at ffffffffc0751c49 [drm]
#15 [ffffb3ac80b5be20] xdx_drm_ioctl at ffffffffc09ced1f [xdxgpu]
#16 [ffffb3ac80b5be58] do_vfs_ioctl at ffffffff9def2669
#17 [ffffb3ac80b5bee0] ksys_ioctl at ffffffff9def2c75
#18 [ffffb3ac80b5bf20] __x64_sys_ioctl at ffffffff9def2c9a
#19 [ffffb3ac80b5bf30] do_syscall_64 at ffffffff9dc04417
#20 [ffffb3ac80b5bf50] entry_SYSCALL_64_after_hwframe at ffffffff9e80008c
RIP: 00007fe0dc8f6217 RSP: 00007ffd311d65b8 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe0dc8f6217
RDX: 00007ffd311d661c RSI: 00000000c00c642e RDI: 000000000000000d
RBP: 00007ffd311d65f0 R8: 00005558633ab8e8 R9: 0000000000000c80
R10: 0000000000000c80 R11: 0000000000000206 R12: 0000555862646500
R13: 00007ffd311d7050 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
反汇编函数pvr_query_pmr_info得到如下汇编语句,pvr_query_pmr_info+49这条汇编语句就是将地址为rax+0x120内存数据取出来放入rax,而此时rax的内容为0000000000000000,所以最终导致kernel crash。下面关于寄存器r12的注释见后面的分析:
crash> dis pvr_query_pmr_info
0xffffffffc09ba6c0 <pvr_query_pmr_info>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc09ba6c5 <pvr_query_pmr_info+5>: push %rbp
0xffffffffc09ba6c6 <pvr_query_pmr_info+6>: mov %rsp,%rbp
0xffffffffc09ba6c9 <pvr_query_pmr_info+9>: push %r14
0xffffffffc09ba6cb <pvr_query_pmr_info+11>: push %r13
0xffffffffc09ba6cd <pvr_query_pmr_info+13>: push %r12
0xffffffffc09ba6cf <pvr_query_pmr_info+15>: push %rbx
0xffffffffc09ba6d0 <pvr_query_pmr_info+16>: mov %rdi,%r12
0xffffffffc09ba6d3 <pvr_query_pmr_info+19>: mov %rdx,%r13
0xffffffffc09ba6d6 <pvr_query_pmr_info+22>: mov %rsi,%rbx
0xffffffffc09ba6d9 <pvr_query_pmr_info+25>: sub $0x18,%rsp
0xffffffffc09ba6dd <pvr_query_pmr_info+29>: mov %gs:0x28,%rax
0xffffffffc09ba6e6 <pvr_query_pmr_info+38>: mov %rax,-0x28(%rbp)
0xffffffffc09ba6ea <pvr_query_pmr_info+42>: xor %eax,%eax
0xffffffffc09ba6ec <pvr_query_pmr_info+44>: callq 0xffffffffc0940a20 <PMR_DeviceNode>
0xffffffffc09ba6f1 <pvr_query_pmr_info+49>: mov 0x120(%rax),%rax //造成crash的语句,崩溃时RAX等于0000000000000000
0xffffffffc09ba6f8 <pvr_query_pmr_info+56>: lea -0x30(%rbp),%rsi
0xffffffffc09ba6fc <pvr_query_pmr_info+60>: mov %r12,%rdi //r12为变量psPMR的值,见PMR_PhysicalSize汇编分析,而pvr_query_pmr_info+44与该语句中见r12的值未改变
0xffffffffc09ba6ff <pvr_query_pmr_info+63>: mov (%rax),%r14
0xffffffffc09ba702 <pvr_query_pmr_info+66>: callq 0xffffffffc0940d10 <PMR_PhysicalSize>
0xffffffffc09ba707 <pvr_query_pmr_info+71>: mov %r12,%rdi
0xffffffffc09ba70a <pvr_query_pmr_info+74>: callq 0xffffffffc0940b20 <PMR_Flags>
0xffffffffc09ba70f <pvr_query_pmr_info+79>: mov %rax,%rcx
0xffffffffc09ba712 <pvr_query_pmr_info+82>: shr $0x3b,%rcx
0xffffffffc09ba716 <pvr_query_pmr_info+86>: cmp $0x2,%ecx
0xffffffffc09ba719 <pvr_query_pmr_info+89>: je 0xffffffffc09ba875 <pvr_query_pmr_info+437>
0xffffffffc09ba71f <pvr_query_pmr_info+95>: jae 0xffffffffc09ba824 <pvr_query_pmr_info+356>
0xffffffffc09ba725 <pvr_query_pmr_info+101>: movl $0x0,(%rbx)
0xffffffffc09ba72b <pvr_query_pmr_info+107>: mov %rax,%rcx
0xffffffffc09ba72e <pvr_query_pmr_info+110>: and $0x3800,%ecx
函数pvr_query_pmr_info的代码如下,因为psDevConfig在PPVRSRV_DEVICE_NODE中的偏移等于0x120,且汇编pvr_query_pmr_info+49的上一句就是调用函数PMR_DeviceNode,所以出问题代码就是这句,即访问结构体PPVRSRV_DEVICE_NODE的成员psDevConfig出现了kernel crash。同样可以得出变量psDevNode的值就是rax,等于0,所以关键就是分析处为什么psDevNode等于0,而psDevNode是通过函数PMR_DeviceNode(psPMR)获取的,该函数就是返回psPMR->psPhysHeap->psDevNode。
int pvr_query_pmr_info(void *psPMRPtr, struct xdx_bo_property *pro,
uint32_t *psize)
{
PMR *psPMR = psPMRPtr;
PMR_FLAGS_T uiPMRFlags;
IMG_DEVMEM_SIZE_T uiPhysicalSize;
PVRSRV_PHYS_HEAP ePhysHeap;
PPVRSRV_DEVICE_NODE psDevNode = PMR_DeviceNode(psPMR);
PVRSRV_DEVICE_CONFIG *psDevConfig = psDevNode->psDevConfig; //psDevConfig在PPVRSRV_DEVICE_NODE中的偏移等于0x120,所以出问题代码就是这句
struct device *dev = psDevConfig->pvOSDevice;
uint64_t uiGPUCacheMode, uiCPUCacheMode;
PMR_PhysicalSize(psPMR, &uiPhysicalSize);
uiPMRFlags = PMR_Flags(psPMR);
......
}
PVRSRV_DEVICE_NODE *
PMR_DeviceNode(const PMR *psPMR)
{
PVR_ASSERT(psPMR != NULL);
return PhysHeapDeviceNode(psPMR->psPhysHeap);
}
PPVRSRV_DEVICE_NODE PhysHeapDeviceNode(PHYS_HEAP *psPhysHeap)
{
PVR_ASSERT(psPhysHeap != NULL);
return psPhysHeap->psDevNode;
}
根据上面分析可知,我们需要找出psPMR的值,从而得到psPhysHeap,并分析它们指向的结构体数据内容。
因为psPMR又作为参数传给了函数PMR_PhysicalSize,故分析一下PMR_PhysicalSize的汇编语句,该函数汇编代码和C代码如下,因为iLockCount在PMR中的偏移等于0xc,所以推测汇编语句PMR_PhysicalSize+23对应C代码OSAtomicRead(&psPMR->iLockCount),语句PMR_PhysicalSize+26测试eax即是否大于0,再下一句汇编判断如果eax小于等于0则跳转到PMR_PhysicalSize+40,PMR_PhysicalSize+41将0赋值给r12指向的内存,与C代码*puiPhysicalSize = 0对应。
所以语句PMR_PhysicalSize+23中的rbx就是参数psPMR的值,从语句PMR_PhysicalSize+18可知rbx由rdi而来,所以rdi是参数psPMR的值。而寄存器rdi在函数PMR_PhysicalSize中未被改变过。再回到pvr_query_pmr_info的汇编,在pvr_query_pmr_info+66处调用了函数PMR_PhysicalSize,而由pvr_query_pmr_info+60可知rdi由r12而来,并向前看直到引起kernel crash的语句pvr_query_pmr_info+49,r12也未被改变过,所以kernel crash时r12的值就是psPMR的值。
crash> dis PMR_PhysicalSize
0xffffffffc0940d10 <PMR_PhysicalSize>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc0940d15 <PMR_PhysicalSize+5>: push %rbp
0xffffffffc0940d16 <PMR_PhysicalSize+6>: test %rdi,%rdi
0xffffffffc0940d19 <PMR_PhysicalSize+9>: mov %rsp,%rbp
0xffffffffc0940d1c <PMR_PhysicalSize+12>: push %r12
0xffffffffc0940d1e <PMR_PhysicalSize+14>: push %rbx
0xffffffffc0940d1f <PMR_PhysicalSize+15>: mov %rsi,%r12
0xffffffffc0940d22 <PMR_PhysicalSize+18>: mov %rdi,%rbx //根据下面汇编的分析验证,rbx为参数psPMR的值,即rdi为参数psPMR的值
0xffffffffc0940d25 <PMR_PhysicalSize+21>: je 0xffffffffc0940d7c <PMR_PhysicalSize+108>
0xffffffffc0940d27 <PMR_PhysicalSize+23>: mov 0xc(%rbx),%eax //iLockCount在PMR中的偏移等于12
0xffffffffc0940d2a <PMR_PhysicalSize+26>: test %eax,%eax //测试iLockCount是否大于0
0xffffffffc0940d2c <PMR_PhysicalSize+28>: jle 0xffffffffc0940d38 <PMR_PhysicalSize+40> //如果iLockCount小于等于0则跳转到PMR_PhysicalSize+40
0xffffffffc0940d2e <PMR_PhysicalSize+30>: mov 0x9c(%rbx),%edx
0xffffffffc0940d34 <PMR_PhysicalSize+36>: test %edx,%edx
0xffffffffc0940d36 <PMR_PhysicalSize+38>: je 0xffffffffc0940d47 <PMR_PhysicalSize+55>
0xffffffffc0940d38 <PMR_PhysicalSize+40>: pop %rbx
0xffffffffc0940d39 <PMR_PhysicalSize+41>: movq $0x0,(%r12) //将0赋值给传出参数puiPhysicalSize指向的地址
0xffffffffc0940d41 <PMR_PhysicalSize+49>: xor %eax,%eax
0xffffffffc0940d43 <PMR_PhysicalSize+51>: pop %r12
0xffffffffc0940d45 <PMR_PhysicalSize+53>: pop %rbp
0xffffffffc0940d46 <PMR_PhysicalSize+54>: retq
0xffffffffc0940d47 <PMR_PhysicalSize+55>: mov 0x98(%rbx),%eax
0xffffffffc0940d4d <PMR_PhysicalSize+61>: test %eax,%eax
0xffffffffc0940d4f <PMR_PhysicalSize+63>: je 0xffffffffc0940d6a <PMR_PhysicalSize+90>
0xffffffffc0940d51 <PMR_PhysicalSize+65>: mov 0x90(%rbx),%rdx
0xffffffffc0940d58 <PMR_PhysicalSize+72>: pop %rbx
0xffffffffc0940d59 <PMR_PhysicalSize+73>: mov 0x8(%rdx),%eax
0xffffffffc0940d5c <PMR_PhysicalSize+76>: imul (%rdx),%rax
0xffffffffc0940d60 <PMR_PhysicalSize+80>: mov %rax,(%r12)
0xffffffffc0940d64 <PMR_PhysicalSize+84>: xor %eax,%eax
0xffffffffc0940d66 <PMR_PhysicalSize+86>: pop %r12
0xffffffffc0940d68 <PMR_PhysicalSize+88>: pop %rbp
0xffffffffc0940d69 <PMR_PhysicalSize+89>: retq
......
PMR_PhysicalSize(const PMR *psPMR,
IMG_DEVMEM_SIZE_T *puiPhysicalSize)
{
PVR_ASSERT(psPMR != NULL);
/* iLockCount will be > 0 for any backed PMR (backed on demand or not) */
if ((OSAtomicRead(&psPMR->iLockCount) > 0) && !psPMR->bIsUnpinned)
{
if (psPMR->bSparseAlloc)
{
*puiPhysicalSize = psPMR->psMappingTable->uiChunkSize * psPMR->psMappingTable->ui32NumPhysChunks;
}
else
{
*puiPhysicalSize = psPMR->uiLogicalSize;
}
}
else
{
*puiPhysicalSize = 0;
}
return PVRSRV_OK;
}
分析psPMR指向的结构体PMR的内容如下,首先查看psPMR指向的结构体PMR的成员psPhyHeap的值,是一个合法的内核虚拟地址,然后再获取psPhyHeap指向的结构体PHYS_HEAP的成员psDevNode的值,等于0,所以也验证了r12就是kernel crash时psPMR的值。因为physical heap(对应结构体PHYS_HEAP)是GPU driver probe到设备时创建的,创建好后一般不会被改变,而且引发出错的是PHYS_HEAP的成员psDevNode,该成员在初始化的时候进行设置,且后面不会再被修改。考虑到psPhysHeap为PMR的成员,所以推测psPMR出问题的概率更大,猜测PMR的成员psPhyHeap就是一个错误的内核虚拟地址,未指向正确的physical heap。
crash> rd ffff994beeeefdb0 //结构体PMR偏移为0处为成员psPhyHeap
ffff994beeeefdb0: ffff994bf505dac0 ....K...
crash> rd ffff994bf505dac8 //ffff994bf505dac0+8为psPhyHeap的成员psDevNode的地址,验证了RAX等于0
ffff994bf505dac8: 0000000000000000 ........
crash>
查看PMR偏移为8即结构体PMR的成员iRefCount的值如下,该成员为PMR的引用计数,如果为0表示PMR应该被释放了,如果被释放了就不应该再访问该PMR,所以PMR的数据确实有问题。
crash>
crash> rd ffff994beeeefdb8 //PMR偏移为8处为成员iRefCount,该值为0表示PMR应该被释放了,如果被释放了就不应该再访问PMR
ffff994beeeefdb8: 0000000000000000 ........
crash>
2.3 源代码分析
根据上面内存数据分析可知PMR内存可能被踩了,或者pmr已经释放了但是还在被访问,或者其他问题,反正PMR肯定出问题了,借着这个思路我们看下PMR到底怎么来的。
根据kernel crash时的堆栈信息,找到相应的代码如下,函数xdx_gem_prime_import调用pvr_query_pmr_info之前先调用了函数pvr_import_dmabuf获取psPMR,其中调用函数pvr_import_dmabuf时参数psPMR是传出参数,函数pvr_import_dmabuf调用了函数PhysmemImportDmaBufToPMR获取psPMR。函数PhysmemImportDmaBufToPMR在if语句外面定义了psPMR,然后在第一个if语句内又定义了psPMR,并将priv->psPMR赋值给if语句内部的psPMR。第二个if语句判断的psPMR是外部定义的psPMR,因为外部psPMR为NULL,所以if条件不成立,不会设置传出参数ppsPMRPtr,最终造成pvr_import_dmabuf没有正确获取psPMR。
struct drm_gem_object *xdx_gem_prime_import(struct drm_device *ddev,
struct dma_buf *dma_buf)
{
struct xdx_device *xdev = drm_to_xdev(ddev);
struct xdx_bo *bo;
struct drm_gem_object *gobj;
int ret;
void *psPMR;
struct xdx_bo_property pro;
uint32_t size;
ret = pvr_import_dmabuf(xdev->pvrdev, dma_buf, &psPMR);
if (ret) {
dev_err(xdev->dev, "PVR driver import dmabuf failed\n");
return ERR_PTR(ret);
}
pvr_query_pmr_info(psPMR, &pro, &size);
......
}
int pvr_import_dmabuf(void *pvrdev, struct dma_buf *dmabuf, void **ppsPMRPtr)
{
PVRSRV_DEVICE_NODE *psDevNode = pvrdev;
PVRSRV_ERROR eError;
eError = PhysmemImportDmaBufToPMR(psDevNode, dmabuf, (PMR **)ppsPMRPtr);
if (eError != PVRSRV_OK)
return -ENOMEM;
return 0;
}
PVRSRV_ERROR
PhysmemImportDmaBufToPMR(PVRSRV_DEVICE_NODE *psDevNode,
struct dma_buf *psDmaBuf,
PMR **ppsPMRPtr)
{
PMR *psPMR = NULL;
......
if (psDmaBuf->ops == &sPVRDmaBufOps)
{
PVRSRV_DEVICE_NODE *psPMRDevNode;
/* We exported this dma_buf, so we can just get its PMR */
PVRDmaBufPrivData *priv = psDmaBuf->priv;
PMR *psPMR = priv->psPMR;
......
}
else
.......
if (psPMR)
{
/* Reuse the PMR we already created */
PMRRefPMR(psPMR);
*ppsPMRPtr = psPMR;
}
......
}
2.4 问题修复
函数PhysmemImportDmaBufToPMR if语句中不再定义psPMR,直接使用外部定义的psPMR,修改方法如下,修改好后测试,问题不再复现。
if (psDmaBuf->ops == &sPVRDmaBufOps)
{
PVRSRV_DEVICE_NODE *psPMRDevNode;
/* We exported this dma_buf, so we can just get its PMR */
PVRDmaBufPrivData *priv = psDmaBuf->priv;
PMR *psPMR = priv->psPMR;
......
}
改为
if (psDmaBuf->ops == &sPVRDmaBufOps)
{
PVRSRV_DEVICE_NODE *psPMRDevNode;
/* We exported this dma_buf, so we can just get its PMR */
PVRDmaBufPrivData *priv = psDmaBuf->priv;
psPMR = priv->psPMR;
......
}
三、总结
此问题是一个野指针引起的,如果有良好的编码规范,此类问题完全可以避免。我刚毕业工作的第一家公司刚入职就需要学习编码规范,并在编码时严格遵守,其中有一条就是函数使用的局部变量都必须在函数体最前面定义,不允许在后面的语句块中定义,所以在后面的工作中我未犯过如此错误,良好的编码规范确实可以避免很多bug。