看到一片讨论ALPHA BLENDING的帖子-优快云博客

本文链接：https://blog.youkuaiyun.com/xueyong1203/article/details/712637

有个哥们说数组循环怎样最快

try {
for (int i = limit; ; --i) {
// do my stuff
}
}
catch (ArrayIndexOutOfBoundsException e) { }

//说明

Woah, turns out he's right! At first I thought it was going to be slower, but then I got to thinking... array bounds are always checked by the JVM, so in essence, it's already doing the comparison for us, probably at a lower level than the loop could. The expense of throwing and catching the exception is miniscule in comparison to doing 387189 useless index tests. Making this change shaved off another 20ms, setting the new record at 600ms for 10 consecutive full screen blends of two 176x220 textures into a third texture on my Samsung A920.

The fastest blended pixel award now goes to..some guy whose name I can't spell! "This pixel was rendered in 1.5495867768595041322314049586777 microseconds and is being widely hailed as a remarkable achievement in the mobile programming world. The critics are raving! Do you have anything you'd like to add?"

在讨论Alpha blending时，他的建议

Holy Jesus, I'm running out of awards!

Using the bit masks was horrible, 10 blends of two 176x220 images into a third went depressingly slow at around a second. Then I decided to try using bit shifts, just to see if it'd be faster. And woah! Best time I got is 520ms. EARTH-SHATTERING! Grin

My best guess is that when you use the bit shifts and mask them, the compiler concludes that you're after the individual bytes, and addresses them directly. When you use an outright channel mask, it has to load the mask and run it for the entire int. But again, I know next to nothing about how the JVM works, so I could be way off. In any case, this code is fast

Code:

(( ((dest >> 16) & 0xFF) + ( ((((src >> 16) & 0xFF)-((dest >> 16) & 0xFF))*alphasrc) >>8) ) << 16) |
(( ((dest >> 8) & 0xFF) + ( ((((src >> 8) & 0xFF)-((dest >> 8) & 0xFF))*alphasrc) >>8) ) << 8) |
(( ((dest ) & 0xFF) + ( ((((src ) & 0xFF)-((dest ) & 0xFF))*alphasrc) >>8) ) )

I tried some other mutations of the above, but none were any faster.

Best time per pixel: 1.3429752066115702479338842975207 microseconds