转载请注明出处:http://www.cnblogs.com/pepetang/p/7777243.html
接上篇文章,笔者在学习使用ARM提供的Neon Intrinsics接口的过程中,代码实现方面遇到的问题基本都能通过谷歌解决,唯独遇到整数除法实现纠结了好长时间。一方面采用查表法对除数取其倒数的时候,由于Neon提供的查表指令限制太多,一度放弃;而针对无符号整型的近似倒数指令一直搞不清楚怎么用,为了尽快完成代码编写任务,当时以该指令浮点版本加上类型转换草草收场。回过头来进行性能优化的时候,还是想一探究竟。想起前面在ARM论坛上看到的一个关于除法实现的讨论,决定参考其中一个答案试试看。该答案摘抄如下:
vrecpe.u32 takes normalized inputs, similar to how floating point significant data is usually stored. What that means is that the input has no leading zeroes past the first bit that's always 0. So the top two bits will always be 01.
Another way to look at it is that vrecpe.u32 works on values between 0.5 and 1.0 (non-inclusive), where the f