6.15.19. Enqueuing Kernels
6.15.19. 排队内核
The functionality described in this section requires support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the 本节中描述的功能需要支持OpenCL C 2.0、OpenCL C 3.0或更高版本以及__OpenCL_C_device_enqueue功能。 |
This section describes built-in functions that allow a kernel to enqueue additional work to the same device, without host interaction. A kernel may enqueue code represented by Block syntax, and control execution order with event dependencies including user events and markers. There are several advantages to using the Block syntax: it is more compact; it does not require a cl_kernel object; and enqueuing can be done as a single semantic step.
本节描述了允许内核将额外工作队列到同一设备而无需主机交互的内置函数。内核可以将块语法表示的代码排队,并使用包括用户事件和标记在内的事件依赖关系来控制执行顺序。使用Block语法有几个优点:它更紧凑;它不需要cl_kernel对象;查询可以作为单个语义步骤完成。
The following table describes the list of built-in functions that can be used to enqueue a kernel(s).
下表描述了可用于将内核排队的内置函数列表。
When the cl_khr_device_enqueue_local_arg_types
extension macro is supported, the Built-in Kernel Enqueue Functions and Built-in Kernel Query Functions described in this section can use any of the built-in OpenCL C scalar or vector integer or floating-point data types, or any user defined type built from these scalar and vector data types, as the pointee type of their arguments. This is indicated by the generic type name gentype
in those function signatures.
当支持cl_khr_device_enqueue_local_arg_types
扩展宏时,本节中描述的内置内核队列函数和内置内核查询函数可以使用任何内置的OpenCL C标量或矢量整数或浮点数据类型,或从这些标量和矢量数据类型构建的任何用户定义类型作为其参数的指针类型。这由这些函数签名中的泛型类型名称gentype表示。
When the cl_khr_device_enqueue_local_arg_types
extension macro is not supported, the pointee type of these functions must be void
.
当不支持cl_khr_device_enqueue_local_arg_types
扩展宏时,这些函数的指针类型必须为void。
The macro CLK_NULL_EVENT
refers to an invalid device event. The macro CLK_NULL_QUEUE
refers to an invalid device queue.
宏CLK_NULL_EVENT引用了无效的设备事件。宏CLK_NULL_QUEUE引用了无效的设备队列。
6.15.19.1. Built-in Functions - Enqueuing a Kernel
6.15.19.1. 内置函数-对内核进行入队列
Built-in Function 内置函数 | Description 描述 |
---|---|
int enqueue_kernel(queue_t queue, kernel_enqueue_flags_t flags, const ndrange_t ndrange, void (^block)(void)) | Enqueue the block for execution to queue. 将要执行的块放入队列。 If an event is returned, enqueue_kernel performs an implicit retain on the returned event. 如果返回了一个事件,enqueue_kernel会对返回的事件执行隐式保留。 |
The enqueue_kernel built-in function allows a work-item to enqueue a block. Work-items can enqueue multiple blocks to a device queue(s).
enqueue_kernel内置函数允许工作项将块入队列。工作项可以将多个块排队到设备队列中。
The enqueue_kernel built-in function returns CLK_SUCCESS
if the block is enqueued successfully and returns CLK_ENQUEUE_FAILURE
otherwise. If the -g compile option is specified in compiler options passed to clCompileProgram or clBuildProgram when compiling or building the parent program, the following errors may be returned instead of CLK_ENQUEUE_FAILURE
to indicate why enqueue_kernel failed to enqueue the block:
如果块成功入队列,则enqueue_kernel内置函数返回CLK_SUCCESS,否则返回CLK_ENQUEUE_FAILURE
。如果在编译或构建父程序时在传递给clCompileProgram或clBuildProgram的编译器选项中指定了-g compile选项,则可能会返回以下错误,而不是CLK_ENQUEUE_FAILURE,以指示enquee_kernel未能将块入队列的原因:
-
CLK_INVALID_QUEUE
if queue is not a valid device queue. -
如果queue不是有效的设备队列,则返回CLK_INVALID_QUEUE。
-
CLK_INVALID_NDRANGE
if ndrange is not a valid ND-range descriptor or if the program was compiled with-cl-uniform-work-group-size
and the local_work_size is specified in ndrange but the global_work_size specified in ndrange is not a multiple of the local_work_size. -
如果ndrange不是有效的ND范围描述符,或者程序是使用
-cl-uniform-work-group-size
编译的,并且在ndrange 中指定了local_work_size,但在ndranges中指定的global_work_size不是local_work_size的倍数,则返回CLK_INVALID_NDRAGE。 -
CLK_INVALID_EVENT_WAIT_LIST
if event_wait_list isNULL
and num_events_in_wait_list > 0, or if event_wait_list is notNULL
and num_events_in_wait_list is 0, or if event objects in event_wait_list are not valid events. -
如果event_wait_list 为NULL且num_events_in_wait_list>0,或者如果event_wait_list 不为NULL且num_events_in_wait_list 为0,或者event_wait_list 中的事件对象不是有效事件,则返回CLK_INVALID_EVENT_WAIT_LIST。
-
CLK_DEVICE_QUEUE_FULL
if queue is full. -
如果queue 已满,则返回
CLK_DEVICE_QUEUE_FULL
。 -
CLK_INVALID_ARG_SIZE
if size of local memory arguments is 0. -
如果本地内存参数的大小为0,则返回CLK_INVALID_ARG_SIZE。
-
CLK_EVENT_ALLOCATION_FAILURE
if event_ret is notNULL
and an event could not be allocated. -
如果event_ret 不为NULL并且无法分配事件,则返回
CLK_EVENT_ALLOCATION_FAILURE
。 -
CLK_OUT_OF_RESOURCES
if there is a failure to queue the block in queue because of insufficient resources needed to execute the kernel. -
CLK_OUT_OF_RESOURCES,如果由于执行内核所需的资源不足而无法将块排队到queue 中。
Below are some examples of how to enqueue a block.
下面是一些如何入队列的例子。
kernel void
my_func_A(global int *a, global int *b, global int *c)
{
...
}
kernel void
my_func_B(global int *a, global int *b, global int *c)
{
ndrange_t ndrange;
// build ndrange information
...
// example - enqueue a kernel as a block
enqueue_kernel(get_default_queue(), ndrange,
^{my_func_A(a, b, c);});
...
}
kernel void
my_func_C(global int *a, global int *b, global int *c)
{
ndrange_t ndrange;
// build ndrange information
...
// note that a, b and c are variables in scope of
// the block
void (^my_block_A)(void) = ^{my_func_A(a, b, c);};
// enqueue the block variable
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
my_block_A);
...
}
The example below shows how to declare a block literal and enqueue it.
下面的示例显示了如何声明块文本并将其入队列。
kernel void
my_func(global int *a, global int *b)
{
ndrange_t ndrange;
// build ndrange information
...
// note that a, b and c are variables in scope of
// the block
void (^my_block_A)(void) =
^{
size_t id = get_global_id(0);
b[id] += a[id];
};
// enqueue the block variable
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
my_block_A);
// or we could have done the following
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
^{
size_t id = get_global_id(0);
b[id] += a[id];
};
}
Blocks passed to enqueue_kernel cannot use global variables or stack variables local to the enclosing lexical scope that are a pointer type in the 传递给enqueue_kernel的块不能使用全局变量或封闭词法作用域本地的堆栈变量,这些变量是本地或私有地址空间中的指针类型。 |
Example:
示例:
kernel void
foo(global int *a, local int *lptr, ...)
{
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
^{
size_t id = get_global_id(0);
local int *p = lptr; // undefined behavior
} );
}
6.15.19.2. Arguments That are a Pointer Type to Local Address Space
6.15.19.2. 指向本地地址空间的指针类型的参数
A block passed to enqueue_kernel can have arguments declared to be a pointer to local
memory. The enqueue_kernel built-in function variants allow blocks to be enqueued with a variable number of arguments. Each argument must be declared to be a void
pointer to local memory. These enqueue_kernel built-in function variants also have a corresponding number of arguments each of type uint
that follow the block argument. These arguments specify the size of each local memory pointer argument of the enqueued block.
传递给enqueue_kernel的块可以将参数声明为指向本地内存的指针。enqueue_kernel内置函数变j量允许使用可变数量的参数对块进行排队。每个参数都必须声明为指向本地内存的空指针。这些enqueue_kernel内置函数变体也有相应数量的参数,每个参数都是块参数后面的uint类型。这些参数指定了排队块的每个本地内存指针参数的大小。
Some examples follow:
以下是一些示例:
kernel void
my_func_A_local_arg1(global int *a, local int *lptr, ...)
{
...
}
kernel void
my_func_A_local_arg2(global int *a,
local int *lptr1, local float4 *lptr2, ...)
{
...
}
kernel void
my_func_B(global int *a, ...)
{
...
ndrange_t ndrange = ndrange_1D(...);
uint local_mem_size = compute_local_mem_size();
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
^(local void *p){
my_func_A_local_arg1(a, (local int *)p, ...);},
local_mem_size);
}
kernel void
my_func_C(global int *a, ...)
{
...
ndrange_t ndrange = ndrange_1D(...);
void (^my_blk_A)(local void *, local void *) =
^(local void *lptr1, local void *lptr2){
my_func_A_local_arg2(
a,
(local int *)lptr1,
(local float4 *)lptr2, ...);};
// calculate local memory size for lptr
// argument in local address space for my_blk_A
uint local_mem_size = compute_local_mem_size();
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
my_blk_A,
local_mem_size, local_mem_size*4);
}
6.15.19.3. A Complete Example
6.15.19.3. 完整示例
The example below shows how to implement an iterative algorithm where the host enqueues the first instance of the nd-range kernel (dp_func_A). The kernel dp_func_A will launch a kernel (evaluate_dp_work_A) that will determine if new nd-range work needs to be performed. If new nd-range work does need to be performed, then evaluate_dp_work_A will enqueue a new instance of dp_func_A . This process is repeated until all the work is completed.
下面的示例显示了如何实现迭代算法,其中主机将nd范围内核(dp_func_A)的第一个实例入队列。内核dp_func_A将启动一个内核(evaluate_dp_work_A),该内核将确定是否需要执行新的nd范围工作。如果确实需要执行新的nd范围工作,则evaluate_dp_work_A将为dp_func_A的新实例排队。重复此过程,直到完成所有工作。
kernel void
dp_func_A(queue_t q, ...)
{
...
// queue a single instance of evaluate_dp_work_A to
// device queue q. queued kernel begins execution after
// kernel dp_func_A finishes
if (get_global_id(0) == 0)
{
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange_1D(1),
^{evaluate_dp_work_A(q, ...);});
}
}
kernel void
evaluate_dp_work_A(queue_t q,...)
{
// check if more work needs to be performed
bool more_work = check_new_work(...);
if (more_work)
{
size_t global_work_size = compute_global_size(...);
void (^dp_func_A_blk)(void) =
^{dp_func_A(q, ...});
// get local WG-size for kernel dp_func_A
size_t local_work_size =
get_kernel_work_group_size(dp_func_A_blk);
// build nd-range descriptor
ndrange_t ndrange = ndrange_1D(global_work_size,
local_work_size);
// enqueue dp_func_A
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
dp_func_A_blk);
}
...
}
6.15.19.4. Determining when a Child Kernel Begins Execution
6.15.19.4. 确定子内核何时开始执行
The kernel_enqueue_flags_t
[88] argument to the enqueue_kernel
built-in functions can be used to specify when the child kernel begins execution. Supported values are described in the following table:
enqueue_kernel
函数的kernel_enqueue_flags_t
[88]参数可用于指定子内核何时开始执行。支持的值如下表所示:
kernel_enqueue_flags_t枚举 | Description 描述 |
---|---|
| Indicates that the enqueued kernels do not need to wait for the parent kernel to finish execution before they begin execution. 表示队列的内核在开始执行之前不需要等待父内核完成执行。 |
| Indicates that all work-items of the parent kernel must finish executing and all immediate [89] side effects committed before the enqueued child kernel may begin execution. 表示父内核的所有工作项必须完成执行,并且在排队的子内核开始执行之前,必须提交所有立即的[89]副作用。 |
| Indicates that the enqueued kernels wait only for the workgroup that enqueued the kernels to finish before they begin execution. [90] 表示入排列的内核在开始执行之前只等待将内核队列的工作组完成。 [90] |
The 当从主机队列并在设备上执行的内核将设备上的内核入队列时,kernel_enqueue_flags_t标志很有用。从主机队列的内核可能没有与之关联的事件。kernel_enqueue_flags_t标志允许开发人员指示子内核何时可以开始执行。 |
6.15.19.5. Determining When a Parent Kernel has Finished Execution
6.15.19.5. 确定父内核何时完成执行
A parent kernel’s execution status is considered to be complete when it and all its child kernels have finished execution. The execution status of a parent kernel will be CL_COMPLETE
if this kernel and all its child kernels finish execution successfully. The execution status of the kernel will be an error code (given by a negative integer value) if it or any of its child kernels encounter an error, or are abnormally terminated.
当父内核及其所有子内核都完成执行时,父内核的执行状态被认为是完成的。如果此内核及其所有子内核成功完成执行,则父内核的执行状态将为CL_COMPLETE。如果内核或其任何子内核遇到错误或异常终止,则内核的执行状态将是错误代码(由负整数值给出)。
For example, assume that the host enqueues a kernel k
for execution on a device. Kernel k
when executing on the device enqueues kernels A
and B
to a device queue(s). The enqueue_kernel call to enqueue kernel B
specifies the event associated with kernel A
in the event_wait_list
argument, i.e. wait for kernel A
to finish execution before kernel B
can begin execution. Let’s assume kernel A
enqueues kernels X
, Y
and Z
. Kernel A
is considered to have finished execution, i.e. its execution status is CL_COMPLETE
, only after A
and the kernels A
enqueued (and any kernels these enqueued kernels enqueue and so on) have finished execution.
例如,假设主机将内核k入队列在设备上执行。内核k在设备上执行时,将内核A和B入队列到设备队列中。对入队内核B的入队内核调用在event_wait_list参数中指定了与内核A关联的事件,即等待内核A完成执行,然后内核B才能开始执行。让我们假设内核A将内核X、Y和Z排队。只有在A和排队的内核A(以及这些排队的内核等)完成执行后,内核A才被认为已经完成执行,即其执行状态为CL_COMPLETE。
6.15.19.6. Built-in Functions - Kernel Query Functions
6.15.19.6. 内置函数-内核查询函数
Built-in Function 内置函数 | Description 描述 |
---|---|
uint get_kernel_work_group_size(void (^block)(void)) | This provides a mechanism to query the maximum work-group size that can be used to execute a block on a specific device given by device. 这提供了一种机制来查询可用于在设备给定的特定device上执行块的最大工作组大小。 block specifies the block to be enqueued. block指定要排队的块。 |
uint get_kernel_preferred_work_group_size_multiple( void (^block)(void)) | Returns the preferred multiple of work-group size for launch. This is a performance hint. Specifying a work-group size that is not a multiple of the value returned by this query as the value of the local work size argument to enqueue_kernel will not fail to enqueue the block for execution unless the work-group size specified is larger than the device maximum. 返回启动时工作组大小的首选倍数。这是一个性能提示。指定一个不是此查询返回值的倍数的工作组大小作为enqueue_kernel的本地工作大小参数的值,不会导致块入队列执行失败,除非指定的工作组大小将大于设备最大值。 |
6.15.19.7. Built-in Functions - Queuing Other Commands
6.15.19.7. 内置函数-排队其他命令
The following table describes the list of built-in functions that can be used to enqueue commands such as a marker.
下表描述了可用于将标记等命令排队的内置函数列表。
Built-in Function 内置函数 | Description 描述 |
---|---|
int enqueue_marker(queue_t queue, uint num_events_in_wait_list, const clk_event_t *event_wait_list, clk_event_t *event_ret) | Enqueue a marker command to queue. 将标记命令入队列。 The marker command waits for a list of events specified by event_wait_list to complete before the marker completes. marker命令在标记完成之前等待event_wait_list指定的事件列表完成。 event_ret must not be event_ret不能为NULL,否则这是一个无操作。 If an event is returned, enqueue_marker performs an implicit retain on the returned event. 如果返回了一个事件,enqueue_marker会对返回的事件执行隐式保留。 |
The enqueue_marker built-in function returns CLK_SUCCESS
if the marked command is enqueued successfully and returns CLK_ENQUEUE_FAILURE
otherwise. If the -g compile option is specified in compiler options passed to clCompileProgram or clBuildProgram, the following errors may be returned instead of CLK_ENQUEUE_FAILURE
to indicate why enqueue_marker failed to enqueue the marker command:
如果标记的命令成功入队列,则enqueue_marker内置函数返回CLK_SUCCESS,否则返回CLK_ENQUEUE_FAILURE
。如果在传递给clCompileProgram或clBuildProgram的编译器选项中指定了-g compile选项,则可能会返回以下错误,而不是CLK_ENQUEUE_FAILURE,以指示enqueue_marker未能将标记命令排队的原因:
-
CLK_INVALID_QUEUE
if queue is not a valid device queue. -
如果queue不是有效的设备队列,则返回CLK_INVALID_QUEUE。
-
CLK_INVALID_EVENT_WAIT_LIST
if event_wait_list isNULL
, or if event_wait_list is notNULL
and num_events_in_wait_list is 0, or if event objects in event_wait_list are not valid events. -
如果event_wait_list为NULL,或者event_wait_list不为NULL且num_events_in_wait_list=0,或者event_wait_list中的事件对象不是有效事件,则返回CLK_INVALID_EVENT_WAIT_LIST。
-
CLK_DEVICE_QUEUE_FULL
if queue is full. -
如果queue已满,则返回
CLK_DEVICE_QUEUE_FULL
。 -
CLK_EVENT_ALLOCATION_FAILURE
if event_ret is notNULL
and an event could not be allocated. -
如果event_ret不为NULL并且无法分配事件,则返回
CLK_EVENT_ALLOCATION_FAILURE
。 -
CLK_OUT_OF_RESOURCES
if there is a failure to queue the block in queue because of insufficient resources needed to execute the kernel. -
CLK_OUT_OF_RESOURCES
,如果由于执行内核所需的资源不足而无法将块排队到queue中。
6.15.19.8. Built-in Functions - Event Functions
6.15.19.8. 内置函数-事件函数
The following table describes the list of built-in functions that work on events.
下表描述了处理事件的内置函数列表。
Built-in Function 内置函数 | Description 描述 | ||
---|---|---|---|
void retain_event(clk_event_t event) | Increments the event reference count. Behavior is undefined if event is not a valid event. 递增事件引用计数。如果event不是有效事件,则行为未定义。 | ||
void release_event(clk_event_t event) | Decrements the event reference count. The event object is deleted once the event reference count is zero, the specific command identified by this event has completed (or terminated), and there are no commands in any device command-queue that require a wait for this event to complete. Behavior is undefined if event is not a valid event. 递减事件引用计数。一旦事件引用计数为零,此事件标识的特定命令已完成(或终止),并且任何设备命令队列中都没有需要等待此事件完成的命令,则会删除事件对象。如果event不是有效事件,则行为未定义。 | ||
clk_event_t create_user_event() | Create a user event. Returns the user event. The execution status of the user event created is set to 创建用户事件。返回用户事件。创建的用户事件的执行状态设置为CL_SUBMITTED。 | ||
bool is_valid_event(clk_event_t event) | Returns true if event is a valid event. Otherwise returns false. 如果event是有效事件,则返回true。否则返回false。 | ||
void set_user_event_status(clk_event_t event, int status) | Sets the execution status of a user event. Behavior is undefined if event is not a valid event returned by create_user_event. status can be either 设置用户事件的执行状态。如果event不是create_user_event返回的有效事件,则行为未定义。status可以是CL_COMPLETE,也可以是表示错误的负整数值。 | ||
void capture_event_profiling_info(clk_event_t event, clk_profiling_info name, global void *value) | Captures the profiling information for functions that are enqueued as commands. These enqueued commands are identified by unique event objects. The profiling information will be available in value once the command identified by event has completed. 捕获作为命令排队的函数的分析信息。这些入队列命令由唯一的事件对象标识。一旦event标识的命令完成,分析信息将以value的形式可用。 Behavior is undefined if event is not a valid event returned by enqueue_kernel. 如果event不是enqueue_kernel返回的有效事件,则行为未定义。 name identifies which profiling information is to be queried and can be: name标识要查询的分析信息,可以是:
value is a pointer to two 64-bit values. value是指向两个64位值的指针。 The first 64-bit value describes the elapsed time 第一个64位值描述了event标识的命令的运行时间CL_PROFILING_COMMAND_END-CL_PROFILING_COMMAND_START,单位为纳秒。 The second 64-bit value describes the elapsed time 第二个64位值描述了event标识的命令的运行时间CL_PROFILING_COMMAND_COMPLETE-CL_PROFILING_COMMAND_START,单位为纳秒。
|
Events can be used to identify commands enqueued to a command-queue from the host. These events created by the OpenCL runtime can only be used on the host, i.e. as events passed in the event_wait_list argument to various enqueue APIs or runtime APIs that take events as arguments, such as clRetainEvent, clReleaseEvent, and clGetEventProfilingInfo.
事件可用于标识从主机排队到命令队列的命令。由OpenCL运行时创建的这些事件只能在主机上使用,即作为在event_wait_list参数中传递给各种队列API或将事件作为参数的运行时API的事件,如clRetainEvent、clReleaseEvent和clGetEventProfilingInfo。
Similarly, events can be used to identify commands enqueued to a device queue (from a kernel). These event objects cannot be passed to the host or used by OpenCL runtime APIs such as the enqueue APIs or runtime APIs that take event arguments.
同样,事件可用于标识(从内核)排队到设备队列的命令。这些事件对象不能传递给主机,也不能由OpenCL运行时API使用,如排队API或接受事件参数的运行时API。
clRetainEvent and clReleaseEvent will return CL_INVALID_OPERATION
if event specified is an event that refers to any kernel enqueued to a device queue using enqueue_kernel or enqueue_marker, or is a user event created by create_user_event.
如果指定的event是指使用enqueue_kernel或enqueue_marker排队到设备队列的任何内核的事件,或者是由create_user_event创建的用户事件,则clRetainEvent和clReleaseEvent将返回CL_INVALID_OPERATION。
Similarly, clSetUserEventStatus can only be used to set the execution status of events created using clCreateUserEvent. User events created on the device can be set using set_user_event_status built-in function.
同样,clSetUserEventStatus只能用于设置使用clCreateUserEvent创建的事件的执行状态。可以使用set_user_event_status内置函数设置在设备上创建的用户事件。
The example below shows how events can be used with kernels enqueued to multiple device queues.
下面的示例显示了如何将事件与排队到多个设备队列的内核一起使用。
extern void barA_kernel(...);
extern void barB_kernel(...);
kernel void
foo(queue_t q0, queue q1, ...)
{
...
clk_event_t evt0;
// enqueue kernel to queue q0
enqueue_kernel(q0,
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange_A,
0, NULL, &evt0,
^{barA_kernel(...);} );
// enqueue kernel to queue q1
enqueue_kernel(q1,
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange_B,
1, &evt0, NULL,
^{barB_kernel(...);} );
// release event evt0. This will get released
// after barA_kernel enqueued in queue q0 has finished
// execution and barB_kernel enqueued in queue q1 and
// waits for evt0 is submitted for execution, i.e. wait
// for evt0 is satisfied.
release_event(evt0);
}
The example below shows how the marker command can be used with kernels enqueued to a device queue.
下面的示例显示了如何将标记命令与排队到设备队列的内核一起使用。
kernel void
foo(queue_t q, ...)
{
...
clk_event_t marker_event;
clk_event_t events[2];
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange,
0, NULL, &events[0],
^{barA_kernel(...);} );
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_NO_WAIT,
ndrange,
0, NULL, &events[1],
^{barB_kernel(...);} );
// barA_kernel and barB_kernel can be executed
// out-of-order. We need to wait for both these
// kernels to finish execution before barC_kernel
// starts execution so we enqueue a marker command and
// then enqueue barC_kernel that waits on the event
// associated with the marker.
enqueue_marker(q, 2, events, &marker_event);
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_NO_WAIT,
1, &marker_event, NULL,
^{barC_kernel(...);} );
release_event(events[0];
release_event(events[1]);
release_event(marker_event);
}
6.15.19.9. Built-in Functions - Helper Functions
6.15.19.9. 内置函数-帮助函数
Built-in Function 内置函数 | Description 描述 |
---|---|
queue_t get_default_queue(void) | Returns the default device queue. If a default device queue has not been created, 返回默认设备队列。如果尚未创建默认设备队列,则返回 |
ndrange_t ndrange_1D(size_t global_work_size) | Builds a 1D, 2D or 3D ND-range descriptor. 构建1D、2D或3D ND范围描述符。 |