1、原始代码
这个代码非常简单,就是将同一行的像素进行反转,我们需要注意,这里的像素是RGBA,因此一个像素共32位
uint32_t line;
// 1. For an image of width w and height h, for all lines in the image, do the following.
for (line = 0; line < h; line++) {
uint32_t *left = raster + (line * w);
uint32_t *right = left + w - 1;
// 2. Swap the pixel at the beginning of the line with the pixel at the end,
// then work inwards, swapping as we go.
while ( left < right ) {
// 3. Swap two pixels on the same line
uint32_t temp = *left;
*left = *right;
*right = temp;
left++;
right--;
}
}
2、Neon优化
我们使用Neon来进行优化,首先从左边和右边分别读取四个像素,然后使用查找表进行像素选择vqtbl1q_u8
。因为是像素反转,所以像素的RGBA四个通道的顺序不变,我们创建一个查找表,左边的四个像素从右向左(reverseIndices ,这样,最左边的像素就到了最右边
),右边的像素也从右向左(这样,最右边的像素就到了最左边
)
for (line = 0; line < h; line++) {
uint32_t *left = raster + (line * w);
uint32_t *right = left + w;
right -= 4;
// Create an index table to obtain pixel information four pixels, based
// on an offset from the left and right base addresses.
//
// This index table is used later by the table lookup intrinsic vqtbl1q_u8.
//
// The indices swap the pixel order preserving channel ordering
// within the pixels, remembering that each pixel is 4 bytes.
//
// For example, if we have four RGBA pixels W, X, Y and Z, we swap
// them to Z, Y, X and W.
//
// This example uses decimal values to aid comprehension.
//
// uint8_t reverseIndices[16] = {
// Fetch pixel Z's RGBA components (indices R=12, G=13, B=14 and A=15)
// and place them in indices 0 to 3:
// [ 0] 0x0C = 12, // Z.Red = 12th byte in the input
// [ 1] 0x0D = 13, // Z.Green = 13th byte in the input
// [ 2] 0x0E = 14, // Z.Blue = 14th byte in the input
// [ 3] 0x0F = 15, // Z.Alpha = 15th byte in the input
// Fetch pixel Y's RGBA components:
// [ 4] 0x08 = 8,
// [ 5] 0x09 = 9,
// [ 6] 0x0A = 10,
// [ 7] 0x0B = 11,
// Fetch pixel X's RGBA components:
// [ 8] 0x04 = 4,
// [ 9] 0x05 = 5,
// [10] 0x06 = 6,
// [11] 0x07 = 7,
// Fetch pixel W's RGBA components:
// [12] 0x00 = 0,
// [13] 0x01 = 1,
// [14] 0x02 = 2,
// [15] 0x03 = 3 };
uint8x8_t reverse1 = vcreate_u8(0x0B0A09080F0E0D0Cull);
uint8x8_t reverse2 = vcreate_u8(0x0302010007060504ull);
uint8x16_t reverseIndices = vcombine_u8(reverse1, reverse2);
// Each loop iteration swaps four pixels from the left with
// four pixels from the right, reversing the order within each
// batch of four pixels.
while ( left < right ) {
// Load pixels from the left and reverse their order
uint8x16_t leftPixels = vld1q_u8((uint8_t*)left);
uint8x16_t reversedLeftPixels = vqtbl1q_u8(leftPixels, reverseIndices);
// Load pixels from the right and reverse their order
uint8x16_t rightPixels = vld1q_u8((uint8_t*)right);
uint8x16_t reversedRightPixels = vqtbl1q_u8(rightPixels, reverseIndices);
// Copy the right-hand pixels to the left and the left-hand pixels
// to the right
vst1q_u8((uint8_t*)left, reversedRightPixels);
vst1q_u8((uint8_t*)right, reversedLeftPixels);
left += 4;
right -= 4;
}
}