SSE instruction optimization challenges in C/C++

Many suggest that the core of Ray-tracing should be implemented using SSE/SSE2, and several SIMD-based Ray-tracer has even been published. People say the performance has been enhanced amazingly…

Very attracting isn’t it… But SSE/SSE2 has an critical constraint of the data that SSE instructions can handle: all data should be 16-byte aligned, or there’ll be runtime errors. It is a typical trade-off between performance and convenience. (Another more common situation is the trade-off between performance and memory occupation amount)

Due to this intractably constraint, the data that is intended to give to SSE/SSE2 has to be specifically put:
    -- heap vars: _aligned_malloc
    -- global vars: __declspec(align(16))

An Intel guy implemented a SSE-based Ray-Tracer successfully http://software.intel.com/en-us/articles/architecture-of-a-real-time-ray-tracer/ I didn’t see any alignment decorator and I guess he used Intel compilers… Another paper on this:

http://www.computer.org/portal/web/csdl/doi/10.1109/TVCG.2009.73

 

Another constaint: std::vector cannot contain _declspec(align(16)) data.

 

At the same time, some complained that SSE code is no faster than VC optimized code. Also, PBRT doesn’t use SSE either but it does put a SSE option in the makefile as the compiler parameter. Maybe modern compilers are too smart and they all support SSE/SSE2, who knows… Anyway it is still worthy of further investigation.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值