Overview
The function fmm_allocate_memory_object is a critical part of the ROCm (Radeon Open Compute) stack, specifically within the HSAKMT (HSA Kernel Mode Thunk) memory management subsystem. Its primary responsibility is to allocate a memory object for a given GPU, managing both the virtual address space and the underlying device memory. This function is invoked during device memory allocation requests and is essential for tracking, managing, and releasing GPU memory resources.
This document provides a comprehensive analysis of the implementation, workflow, and technical details of fmm_allocate_memory_object, including its interactions with other components, error handling, and design considerations.
Function Signature
static vm_object_t *fmm_allocate_memory_object(
uint32_t gpu_id,
void *mem,
uint64_t MemorySizeInBytes,
manageable_aperture_t *aperture,
uint64_t *mmap_offset,
uint32_t ioc_flags
)
-
gpu_id: The target GPU identifier for the allocation.
-
mem: The base virtual address for the allocation.
-
MemorySizeInBytes: The total size of memory to allocate.
-
aperture: The aperture (address space region) in which the allocation is made.
-
mmap_offset: Pointer to an offset value used for userptr allocations.
-
ioc_flags: Flags controlling the allocation behavior (e.g., VRAM, userptr, etc.).
-
High-Level Workflow
The function follows these major steps:
-
Input Validation: Checks for null pointers and invalid parameters.
-
Preparation of IOCTL Arguments: Sets up the arguments for the kernel ioctl call to allocate GPU memory.
-
Handling Address Calculation: Adjusts the virtual address based on aperture and allocation flags.
-
Device Memory Allocation Loop: Handles large allocations by splitting them into multiple buffers if necessary.
-
Object Creation and Registration: Creates a
vm_object_tto track the allocation and registers it in the aperture's red-black tree. -
Error Handling and Cleanup: Ensures proper cleanup in case of allocation failures.
-
Return Value: Returns the created
vm_object_ton success, orNULLon failure.
Detailed Implementation Analysis
1. Input Validation
The function begins by validating the input parameters:
if (!mem)
return NULL;
If the memory pointer is null, the function immediately returns NULL, indicating failure.
2. Preparation of IOCTL Arguments
The function prepares the arguments for the kernel ioctl call (AMDKFD_IOC_ALLOC_MEMORY_OF_GPU), which is responsible for allocating memory on the GPU:
struct kfd_ioctl_alloc_memory_of_gpu_args args = {0};
args.gpu_id = gpu_id;
args.flags = ioc_flags | KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE;
args.va_addr = (uint64_t)mem;
- gpu_id: Specifies the target GPU.
- flags: Combines the input flags with
NO_SUBSTITUTE, ensuring strict allocation semantics. - va_addr: The virtual address for the allocation.
Address Adjustment
For non-dGPU (APU) systems and VRAM allocations, the address is adjusted relative to the aperture base:
if (!hsakmt_is_dgpu && (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM))
args.va_addr = VOID_PTRS_SUB(mem, aperture->base);
For allocations in the mem_handle_aperture, which are used for buffer handles without valid virtual addresses, the address is set to zero:
if (aperture == &mem_handle_aperture)
args.va_addr = 0;
3. Device Memory Allocation Loop
The function supports large allocations by splitting them into multiple buffers, each smaller than a predefined maximum (BIGGEST_SINGLE_BUF_SIZE). This is necessary because the kernel driver may not support very large single allocations.
Initialization
total_size = 0;
if (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
size = MemorySizeInBytes < BIGGEST_SINGLE_BUF_SIZE ?
MemorySizeInBytes : BIGGEST_SINGLE_BUF_SIZE;
offset = *mmap_offset;
args.mmap_offset = *mmap_offset;
} else {
size = MemorySizeInBytes;
}
- For userptr allocations, the function uses the mmap offset and splits the allocation.
- For other allocations, the entire size is used.
Allocation Loop
do {
args.size = size;
if (hsakmt_ioctl(hsakmt_kfd_fd, AMDKFD_IOC_ALLOC_MEMORY_OF_GPU, &args))
goto err_hsakmt_ioctl_failed;
// Object creation and registration
...
args.va_addr += size;
offset += size;
if (ioc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)
args.mmap_offset = offset;
total_size += size;
if (total_size + BIGGEST_SINGLE_BUF_SIZE > MemorySizeInBytes)
size = MemorySizeInBytes - total_size;
else
size = BIGGEST_SINGLE_BUF_SIZE;
} while (total_size < MemorySizeInBytes);
- The loop continues until the total allocated size matches the requested size.
- Each iteration allocates a chunk of memory and updates the address and offset for the next chunk.
4. Object Creation and Registration
After a successful allocation, the function creates a vm_object_t to track the allocation:
if (!vm_obj) {
pthread_mutex_lock(&aperture->fmm_mutex);
vm_obj = aperture_allocate_object(aperture, mem, args.handle,
MemorySizeInBytes, mflags);
pthread_mutex_unlock(&aperture->fmm_mutex);
if (!vm_obj)
goto err_object_allocation_failed;
if (mmap_offset)
*mmap_offset = args.mmap_offset;
} else {
vm_obj->handles[vm_obj->handle_num++] = args.handle;
}
- The first chunk creates the object and registers it in the aperture's red-black tree.
- Subsequent chunks add their handles to the object's handle array.
Example Workflow
- User requests allocation of 10GB VRAM on GPU 0.
fmm_allocate_memory_objectis called with appropriate parameters.- The function splits the request into multiple 512GB chunks (or smaller, as defined).
- For each chunk:
- Prepares ioctl arguments.
- Calls the kernel to allocate memory.
- On success, adds the handle to the object's handle array.
- After all chunks are allocated, the object is registered in the aperture's tree.
- The function returns the object for further use.
Conclusion
fmm_allocate_memory_object is a robust, thread-safe function that manages the allocation of GPU memory objects in the ROCm HSAKMT subsystem. It abstracts the complexities of chunked allocations, aperture management, and kernel interactions, providing a reliable foundation for higher-level memory management operations. Its careful design ensures efficient resource tracking, error handling, and integration with both hardware and kernel drivers.
This function is central to the ROCm memory management strategy, enabling advanced features such as multi-GPU support, userptr allocations, and handle-based memory management, all while maintaining high reliability and performance.
If this is helpful, please like, save, and follow.
656

被折叠的 条评论
为什么被折叠?



