Neon近似倒数指令vrecpe_u32解析

最新推荐文章于 2023-11-23 15:39:20 发布

weixin_30411819

最新推荐文章于 2023-11-23 15:39:20 发布

阅读量1k

点赞数 1

CC 4.0 BY-SA版权

文章标签：嵌入式

原文链接：http://www.cnblogs.com/pepetang/p/7777243.html

本文介绍了ARM Neon的vrecpe_u32指令在处理整数近似倒数时的使用方法和注意事项。通过vclz指令找到正常化输入的方法，并探讨了vrecpe_u32的输出精度和性能表现。

转载请注明出处：http://www.cnblogs.com/pepetang/p/7777243.html

接上篇文章，笔者在学习使用ARM提供的Neon Intrinsics接口的过程中，代码实现方面遇到的问题基本都能通过谷歌解决，唯独遇到整数除法实现纠结了好长时间。一方面采用查表法对除数取其倒数的时候，由于Neon提供的查表指令限制太多，一度放弃；而针对无符号整型的近似倒数指令一直搞不清楚怎么用，为了尽快完成代码编写任务，当时以该指令浮点版本加上类型转换草草收场。回过头来进行性能优化的时候，还是想一探究竟。想起前面在ARM论坛上看到的一个关于除法实现的讨论，决定参考其中一个答案试试看。该答案摘抄如下：

vrecpe.u32 takes normalized inputs, similar to how floating point significant data is usually stored. What that means is that the input has no leading zeroes past the first bit that's always 0. So the top two bits will always be 01.

Another way to look at it is that vrecpe.u32 works on values between 0.5 and 1.0 (non-inclusive), where the f