7. OpenCL Embedded Profile
7.OpenCL嵌入式配置文件
The OpenCL specification describes the feature requirements for desktop platforms. This section describes the OpenCL embedded profile that allows us to target a subset of the OpenCL specification for handheld and embedded platforms. The optional extensions defined in the OpenCL Extension Specification apply to both profiles.
OpenCL规范描述了桌面平台的功能要求。本节描述了OpenCL嵌入式配置文件,该配置文件允许我们针对手持和嵌入式平台的OpenCL规范的一个子集。OpenCL扩展规范中定义的可选扩展适用于这两个配置文件。
The OpenCL embedded profile has the following restrictions until version 2.0 (i.e. the optionality described below was deprecated by version 2.0):
在2.0版本之前,OpenCL嵌入式配置文件有以下限制(即2.0版本已弃用下文所述的可选性):
1.Support for 3D images is optional.
1.支持3D图像是可选的。
If CL_DEVICE_IMAGE3D_MAX_WIDTH, CL_DEVICE_IMAGE3D_MAX_HEIGHT and CL_DEVICE_IMAGE3D_MAX_DEPTH are zero, calls to clCreateImage or clCreateImageWithProperties will fail to create the 3D image, and the errcode_ret argument will return CL_INVALID_OPERATION. Declaring arguments of type image3d_t
in a kernel will result in a compilation error.
如果CL_DEVICE_IMAGE3D_MAX_WIDTH、CL_DEVICE_IMAGE3D_MAX_HEIGHT和CL_DEVICE_IMAGE3D_MAX_DEPTH为零,则对clCreateImage或clCreateImageWithProperties的调用将无法创建3D图像,errcode_ret参数将返回CL_INVALID_OPERATION。在内核中声明image3d_t类型的参数将导致编译错误。
If CL_DEVICE_IMAGE3D_MAX_WIDTH, CL_DEVICE_IMAGE3D_MAX_HEIGHT and CL_DEVICE_IMAGE3D_MAX_DEPTH are greater than zero 0, calls to clCreateImage and clCreateImageWithProperties will behave as described for full profile implementations, and the image3d_t
data type can be used in a kernel.
如果CL_DEVICE_IMAGE3D_MAX_WIDTH、CL_DEVICE_IMAGE3D_MAX_HEIGHT和CL_DEVICE_IMAGE3D_MAX_DEPTH大于0,则对clCreateImage和clCreateImageWithProperties的调用将按照完整配置文件实现的描述进行,并且可以在内核中使用image3d_t
数据类型。
2.Support for 2D image array writes is optional. If the cles_khr_2d_image_array_writes extension is supported by the embedded profile, writes to 2D image arrays are supported.
2.支持2D图像数组写入是可选的。如果嵌入式配置文件支持cles_khr_2d_image_array_writes扩展名,则支持写入二维图像数组。
3.Image and image arrays created with an image_channel_data_type
value of CL_FLOAT or CL_HALF_FLOAT can only be used with samplers that use a filter mode of CL_FILTER_NEAREST. The values returned by read_imagef [34] for 2D and 3D images if image_channel_data_type
value is CL_FLOAT or CL_HALF_FLOAT and sampler with filter_mode = CL_FILTER_LINEAR are undefined.
3.使用CL_FLOAT或CL_HALF_FLOAT的Image_channel_data_type值创建的图像和图像数组只能与使用CL_FILTER_NEAREST过滤模式的采样器一起使用。如果image_channel_data_type值为CL_FLOAT或CL_HALF_FLOAT,并且filter_mode=CL_FILTER_LINEAR的采样器未定义,则read_imagef[34]为2D和3D图像返回的值。
Furthermore, the OpenCL embedded profile has the following restrictions for all versions:
此外,OpenCL嵌入式配置文件对所有版本都有以下限制:
1.64 bit integers i.e. long, ulong including the appropriate vector data types and operations on 64-bit integers are optional. The cles_khr_int64 [35] extension string will be reported if the embedded profile implementation supports 64-bit integers. If double precision is supported i.e. CL_DEVICE_DOUBLE_FP_CONFIG is not zero, then cles_khr_int64 must also be supported.
1.64位整数,即长整数,包括适当的向量数据类型和对64位整数的操作是可选的。如果嵌入式配置文件实现支持64位整数,则将报告cles_khr_int64[35]扩展字符串。如果支持双精度,即CL_DEVICE_DOUBLE_FP_CONFIG不为零,则还必须支持cles_khr_int64。
2.The mandated minimum single precision floating-point capability given by CL_DEVICE_SINGLE_FP_CONFIG is CL_FP_ROUND_TO_ZERO or CL_FP_ROUND_TO_NEAREST. If CL_FP_ROUND_TO_NEAREST is supported, the default rounding mode will be round to nearest even; otherwise the default rounding mode will be round to zero.
2.CL_DEVICE_SINGLE_FP_CONFIG给出的强制最小单精度浮点能力是CL_FP_ROUND_TO_ZERO或CL_FP_ROUND_TO_NEAREST。如果支持CL_FP_ROUND_TO_NEAREST,则默认舍入模式将四舍五入到最接近的偶数;否则,默认的舍入模式将四舍五入为零。
3.The single precision floating-point operations (addition, subtraction and multiplication) shall be correctly rounded. Zero results may always be positive 0.0. The accuracy of division and sqrt are given in the OpenCL C and OpenCL SPIR-V Environment specifications.
3.单精度浮点运算(加法、减法和乘法)应正确四舍五入。零结果可能总是正0.0。OpenCL C和OpenCL SPIR-V环境规范中给出了除法和sqrt的精度。
If CL_FP_INF_NAN is not set in CL_DEVICE_SINGLE_FP_CONFIG, and one of the operands or the result of addition, subtraction, multiplication or division would signal the overflow or invalid exception (see IEEE 754 specification), the value of the result is implementation-defined. Likewise, single precision comparison operators (<, >, <=, >=, ==, !=) return implementation-defined values when one or more operands is a NaN.
如果CL_FP_INF_NAN未在CL_DEVICE_SINGLE_FP_CONFIG中设置,并且操作数之一或加、减、乘或除的结果将发出溢出或无效异常的信号(参见IEEE 754规范),则结果的值由实现定义。同样,当一个或多个操作数是NaN时,单精度比较运算符(<,>,<=,>=,==,!=)返回实现定义的值。
In all cases, conversions (see the OpenCL C and OpenCL SPIR-V Environment specifications) shall be correctly rounded as described for the FULL_PROFILE, including those that consume or produce an INF or NaN. The built-in math functions shall behave as described for the FULL_PROFILE, including edge case behavior, but with slightly different accuracy rules. Edge case behavior and accuracy rules are described in the OpenCL C and OpenCL SPIR-V Environment specifications.
在所有情况下,转换(见OpenCL C和OpenCL SPIR-V环境规范)应按照FULL_PROFILE的描述正确四舍五入,包括消耗或产生INF或NaN的转换。内置数学函数的行为应如FULL_PROFILE所述,包括边缘情况行为,但精度规则略有不同。OpenCL C和OpenCL SPIR-V环境规范中描述了边缘情况行为和精度规则。
If addition, subtraction and multiplication have default round to zero rounding mode, then fract, fma and fdim shall produce the correctly rounded result for round to zero rounding mode. 如果加法、减法和乘法具有默认的舍入到零舍入模式,则fract、fma和fdim应产生舍入到零的正确结果。 This relaxation of the requirement to adhere to IEEE 754 requirements for basic floating-point operations, though extremely undesirable, is to provide flexibility for embedded devices that have lot stricter requirements on hardware area budgets. 对基本浮点运算遵守IEEE 754要求的要求的放宽,虽然非常不可取,但为对硬件区域预算有更严格要求的嵌入式设备提供了灵活性。 |
4.Denormalized numbers for the half data type which may be generated when converting a float to a half using variants of the vstore_half function or when converting from a half to a float using variants of the vload_half function can be flushed to zero. The OpenCL SPIR-V Environment Specification for details.
4.当使用vstore_half函数的变体将浮点数转换为半值时,或者当使用vload_half功能的变体将半值转换为浮点数时,可以将半值数据类型的非规范化数刷新为零。有关详细信息,请参阅OpenCL SPIR-V环境规范。
5.The precision of conversions from CL_UNORM_INT8, CL_SNORM_INT8, CL_UNORM_INT16, CL_SNORM_INT16, CL_UNORM_INT_101010, and CL_UNORM_INT_101010_2 to float is ≤ 2 ulp for the embedded profile instead of ≤ 1.5 ulp as defined in the full profile. The exception cases described in the full profile and given below apply to the embedded profile.
5.对于嵌入式配置文件,从CL_UNORM_INT8、CL_SNORM_INT8、CL_UNORM_INT16、CL_SORM_INT16、CLE_UNORM_INT_101010和CL_UNORN_INT_101010_2到浮点的转换精度≤2 ulp,而不是完整配置文件中定义的≤1.5 ulp。完整配置文件中描述的例外情况和下面给出的例外情况适用于嵌入式配置文件。
For CL_UNORM_INT8
对于CL_UNORM_INT8
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
255 must convert to 1.0f
-
255必须转换为1.0f
For CL_UNORM_INT16
对于CL_UNORM_INT16
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
65535 must convert to 1.0f
-
65535必须转换为1.0f
For CL_SNORM_INT8
对于CL_SNORM_INT8
-
-128 and -127 must convert to -1.0f,
-
-128和-127必须转换为-1.0f,
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
127 must convert to 1.0f
-
127必须转换为1.0f
For CL_SNORM_INT16
对于CL_SNORM_INT16
-
-32768 and -32767 must convert to -1.0f,
-
-32768和-32767必须转换为-1.0f,
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
32767 must convert to 1.0f
-
32767必须转换为1.0f
对于CL_UNORM_INT_101010
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
1023 must convert to 1.0f
-
1023必须转换为1.0f
对于CL_UNORM_INT_101010_2
-
0 must convert to 0.0f and
-
0必须转换为0.0f,并且
-
1023 must convert to 1.0f (for RGB)
-
1023必须转换为1.0f(用于RGB)
-
3 must convert to 1.0f (for A)
-
3必须转换为1.0f(对于A)
CL_PLATFORM_PROFILE defined in the OpenCL Platform Queries table will return the string EMBEDDED_PROFILE if the OpenCL implementation supports the embedded profile only.
如果OpenCL实现仅支持嵌入式配置文件,则OpenCL平台查询表中定义的CL_PLATFORM_PROFILE将返回字符串EMBEDDED_PROFILE。
The minimum maximum values specified in the OpenCL Device Queries table that have been modified for the OpenCL embedded profile are listed in the OpenCL Embedded Device Queries table.
OpenCL嵌入式设备查询表中列出了为OpenCL嵌入式配置文件修改的OpenCL设备查询表指定的最小最大值。
Device Info 设备信息 | Return Type 返回类型 | Description 描述 |
---|---|---|
| Max number of image objects arguments of a kernel declared with the | |
| Max number of image objects arguments of a kernel declared with the | |
| Max number of image objects arguments of a kernel declared with the | |
| Max width of 2D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max height of 2D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max width of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max height of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max depth of 3D image in pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max number of pixels for a 1D image created from a buffer object. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max number of images in a 1D or 2D image array. The minimum value is 256 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Maximum number of samplers that can be used in a kernel. The minimum value is 8 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE, the value is 0 otherwise. | |
| Max size in bytes of all arguments that can be passed to a kernel. The minimum value is 256 bytes for devices that are not of type CL_DEVICE_TYPE_CUSTOM. A maximum of 255 arguments can be passed to a kernel. | |
| Describes single precision floating-point capability of the device. This is a bit-field that describes one or more of the following values: 描述设备的单精度浮点功能。这是一个位字段,描述以下一个或多个值: CL_FP_DENORM - denorms are supported CL_FP_DENORM-支持去噪 CL_FP_INF_NAN - INF and quiet NaNs are supported. CL_FP_INF_NAN-支持INF和安静的NaN。 CL_FP_ROUND_TO_NEAREST - round to nearest even rounding mode supported CL_FP_ROUND_TO_NEAREST-支持四舍五入到最近的偶数舍入模式 CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported CL_FP_ROUND_TO_ZERO-支持四舍五入到零的舍入模式 CL_FP_ROUND_TO_INF - round to positive and negative infinity rounding modes supported CL_FP_ROUND_TO_INF-支持舍入到正无穷大和负无穷大舍入模式 CL_FP_FMA - IEEE754-2008 fused multiply-add is supported. CL_FP_FMA-支持IEEE754-2008融合乘加。 CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT - divide and sqrt are correctly rounded as defined by the IEEE754 specification. CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT-除法和sqrt按照IEEE754规范的定义正确四舍五入。 CL_FP_SOFT_FLOAT - Basic floating-point operations (such as addition, subtraction, multiplication) are implemented in software. CL_FP_SOFT_FLOAT-基本的浮点运算(如加法、减法、乘法)是在软件中实现的。 The mandated minimum floating-point capability is: CL_FP_ROUND_TO_ZERO or CL_FP_ROUND_TO_NEAREST for devices that are not of type CL_DEVICE_TYPE_CUSTOM. 对于非CL_DEVICE_TYPE_CUSTOM类型的设备,规定的最小浮点能力为:CL_FP_ROUND_TO_ZERO或CL_FP_ROUND_TO_NEAREST。 | |
| Max size in bytes of a constant buffer allocation. The minimum value is 1 KB for devices that are not of type CL_DEVICE_TYPE_CUSTOM. | |
| Max number of arguments declared with the | |
| Size of local memory arena in bytes. The minimum value is 1 KB for devices that are not of type CL_DEVICE_TYPE_CUSTOM. | |
| Is CL_FALSE if the implementation does not have a compiler available to compile the program source. Is CL_TRUE if the compiler is available. This can be CL_FALSE for the embedded platform profile only. | |
| Is CL_FALSE if the implementation does not have a linker available. Is CL_TRUE if the linker is available. This can be CL_FALSE for the embedded platform profile only. This must be CL_TRUE if CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE | |
| The max. size of the device queue in bytes. The minimum value is 64 KB for the embedded profile 设备队列的最大大小(字节)。嵌入式配置文件的最小值为64 KB | |
| Maximum size in bytes of the internal buffer that holds the output of printf calls from a kernel. The minimum value for the EMBEDDED profile is 1 KB. 保存内核printf调用输出的内部缓冲区的最大字节数。EMBEDDED配置文件的最小值为1 KB。 |
If CL_DEVICE_IMAGE_SUPPORT specified in the OpenCL Device Queries table is CL_TRUE, the values assigned to CL_DEVICE_MAX_READ_IMAGE_ARGS, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, CL_DEVICE_IMAGE2D_MAX_WIDTH, CL_DEVICE_IMAGE2D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_WIDTH, CL_DEVICE_IMAGE3D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_DEPTH, and CL_DEVICE_MAX_SAMPLERS by the implementation must be greater than or equal to the minimum values specified in the OpenCL Embedded Device Queries table.
如果在OpenCL设备查询表中指定的CL_DEVICE_IMAGE_SUPPORT为CL_TRUE,则由实现分配给CL_DEVICE_MAX_READ_IMAGE_ARGS、CL_DEVICE_MAX_WRITE_IMAGE_ARGS、CL_DEVICE_IMAGE2D_MAX_WIDTH、CL_DEVICE_IMAGE2D_MAX_HEIGHT、CL_DEVICE_IMAGE3D_MAX_WIDTH、CL_DEVICE_IMAGE3D_MAX_HEIGHT、CL_DEVICE_IMAGE3D_MAX_DEPTH和CL_DEVICE_MAX_SAMPLERS的值必须大于或等于OpenCL嵌入式设备查询表指定的最小值。
If CL_DEVICE_IMAGE_SUPPORT specified in the OpenCL Device Queries table is CL_TRUE, the minimum list of supported image formats for either reading or writing in a kernel for embedded profile devices is:
如果OpenCL设备查询表中指定的CL_DEVICE_IMAGE_SUPPORT为CL_TRUE,则嵌入式配置文件设备在内核中读取或写入支持的图像格式的最小列表为:
num_channels 通道数 | channel_order 通道顺序 | channel_data_type 通道数据类型 |
---|---|---|
4 | CL_UNORM_INT8 |
For embedded profiles devices that support reading from and writing to the same image object from the same kernel instance (see CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS) there is no required minimum list of supported image formats.
对于支持从同一内核实例读取和写入同一映像对象的嵌入式配置文件设备(请参阅CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS),不需要支持的映像格式的最小列表。