AMD rocr-libhsakmt分析系列4-2: Implementation Analysis of fmm_allocate_memory_object

Overview

The function fmm_allocate_memory_object is a critical part of the ROCm (Radeon Open Compute) stack, specifically within the HSAKMT (HSA Kernel Mode Thunk) memory management subsystem. Its primary responsibility is to allocate a memory object for a given GPU, managing both the virtual address space and the underlying device memory. This function is invoked during device memory allocation requests and is essential for tracking, managing, and releasing GPU memory resources.

This document provides a comprehensive analysis of the implementation, workflow, and technical details of fmm_allocate_memory_object, including its interactions with other components, error handling, and design considerations.

Function Signature

static vm_object_t *fmm_allocate_memory_object(
    uint32_t gpu_id,
    void *mem,
    uint64_t MemorySizeInBytes,
    manageable_aperture_t *aperture,
    uint64_t *mmap_offset,
    uint32_t ioc_flags
)
  • gpu_id: The target GPU identifier for the allocation.

  • mem: The base virtual address for the allocation.

  • MemorySizeInBytes: The total size of memory to allocate.

  • aperture: The aperture (address space region) in which the allocation is made.

  • mmap_offset: Pointer to an offset value used for userptr allocations.

  • ioc_flags: Flags controlling the allocation behavior (e.g., VRAM, userptr, etc.).

  • High-Level Workflow

    The function follows these major steps:

  • Input Validation: Checks for null pointers and invalid parameters.

  • Preparation of IOCTL Arguments: Sets up the arguments for the kernel ioctl call to allocate GPU memory.

  • Handling Address Calculation: Adjusts the virtual address based on aperture and allocation flags.

  • Device Memory Allocation Loop: Handles large allocations by splitting them into multiple buffers if necessary.

  • Object Creation and Registration: Creates a vm_object_t to track the allocation and registers it in the aperture's red-black tree.

  • Error Handling and Cleanup: Ensures proper cleanup in case of allocation failures.

  • Return Value: Returns the created vm_object_t on success, or NULL on failure.

Detailed Implementation Analysis

1. Input Validation

The function begins by validating the input parameters:

if (!mem)
    return NULL;

If the memory pointer is null, the function immediately returns NULL, indicating failure.

2. Preparation of IOCTL Arguments

The function prepares the arguments for the kernel ioctl call (AMDKFD_IOC_ALLOC_MEMORY_OF_GPU), which is responsible for allocating memory on the GPU:

struct kfd_ioctl_alloc_memory_of_gpu_args args = {0};
args.gpu_id = gpu_id;
args.flags = ioc_flags | KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE;
args.va_addr = (uint64_t)mem;
  • gpu_id: Specifies the target GPU.
  • flags: Combines the input flags with NO_SUBSTITUTE, ensuring strict allocation semantics.
  • va_addr: The virtual address for the allocation.
Address Adjustment

For non-dGPU (APU) systems and VRAM allocations, the address is adjusted relative to the aperture base:

if (!hsakmt_is_dgpu && (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM))
    args.va_addr = VOID_PTRS_SUB(mem, aperture->base);

For allocations in the mem_handle_aperture, which are used for buffer handles without valid virtual addresses, the address is set to zero:

if (aperture == &mem_handle_aperture)
    args.va_addr = 0;

3. Device Memory Allocation Loop

The function supports large allocations by splitting them into multiple buffers, each smaller than a predefined maximum (BIGGEST_SINGLE_BUF_SIZE). This is necessary because the kernel driver may not support very large single allocations.

Initialization
total_size = 0;
if (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
    size = MemorySizeInBytes < BIGGEST_SINGLE_BUF_SIZE ?
        MemorySizeInBytes : BIGGEST_SINGLE_BUF_SIZE;
    offset = *mmap_offset;
    args.mmap_offset = *mmap_offset;
} else {
    size = MemorySizeInBytes;
}
  • For userptr allocations, the function uses the mmap offset and splits the allocation.
  • For other allocations, the entire size is used.
Allocation Loop
do {
    args.size = size;

    if (hsakmt_ioctl(hsakmt_kfd_fd, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, &args))
        goto err_hsakmt_ioctl_failed;

    // Object creation and registration
    ...
    args.va_addr += size;
    offset += size;

    if (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)
        args.mmap_offset = offset;

    total_size += size;
    if (total_size + BIGGEST_SINGLE_BUF_SIZE > MemorySizeInBytes)
        size = MemorySizeInBytes - total_size;
    else
        size = BIGGEST_SINGLE_BUF_SIZE;
} while (total_size < MemorySizeInBytes);
  • The loop continues until the total allocated size matches the requested size.
  • Each iteration allocates a chunk of memory and updates the address and offset for the next chunk.

4. Object Creation and Registration

After a successful allocation, the function creates a vm_object_t to track the allocation:

if (!vm_obj) {
    pthread_mutex_lock(&aperture->fmm_mutex);
    vm_obj = aperture_allocate_object(aperture, mem, args.handle,
            MemorySizeInBytes, mflags);

    pthread_mutex_unlock(&aperture->fmm_mutex);
    if (!vm_obj)
        goto err_object_allocation_failed;

    if (mmap_offset)
        *mmap_offset = args.mmap_offset;
} else {
    vm_obj->handles[vm_obj->handle_num++] = args.handle;
}
  • The first chunk creates the object and registers it in the aperture's red-black tree.
  • Subsequent chunks add their handles to the object's handle array.

Example Workflow

  1. User requests allocation of 10GB VRAM on GPU 0.
  2. fmm_allocate_memory_object is called with appropriate parameters.
  3. The function splits the request into multiple 512GB chunks (or smaller, as defined).
  4. For each chunk:
    • Prepares ioctl arguments.
    • Calls the kernel to allocate memory.
    • On success, adds the handle to the object's handle array.
  5. After all chunks are allocated, the object is registered in the aperture's tree.
  6. The function returns the object for further use.

Conclusion

fmm_allocate_memory_object is a robust, thread-safe function that manages the allocation of GPU memory objects in the ROCm HSAKMT subsystem. It abstracts the complexities of chunked allocations, aperture management, and kernel interactions, providing a reliable foundation for higher-level memory management operations. Its careful design ensures efficient resource tracking, error handling, and integration with both hardware and kernel drivers.

This function is central to the ROCm memory management strategy, enabling advanced features such as multi-GPU support, userptr allocations, and handle-based memory management, all while maintaining high reliability and performance.

If this is helpful, please like, save, and follow.

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DeeplyMind

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值