基于移动平台消除冗余GPU绘制片段的技术

本文介绍了一种通过硬件缓存技术减少移动GPU上冗余片段着色器执行的方法,该方法可在保持渲染质量的同时节省约60%的计算资源,并有效延长电池寿命。针对跨帧冗余问题,提出了一种基于任务级缓存方案,结合并行帧渲染技术,旨在提高计算效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

A brief report about paper Eliminating Redundant Fragment Shader Executions on a Mobile GPU via Hardware Memoization .


GPU is made for render, no matter rendering for movies, for video games, or just for models. In this rendering, here comes the problem that there are some fragments that have been rendered not only once, or in an other word, redundant[i]rendering, especially in the scenes of video games. And this paper aims to reduce or eliminate the redundant fragment shader. Less computation,  more battery. So this paper is also about energy saving for GPU.

 

The main difficulty of this work is that the redundancy exists across frames, which means, this is a temporal problem, not spatial. And the paper's scheme can remove about 60% of the redundant fragment computations for mobile devices. To remove the redundant fragments in temporal domain, a task-level memoization scheme is added on the top of PFR(Parallel Frame Rendering).

 

Yet,programmer can't access the graphics memory directly, so the author uses the HW structure as signature of the total input. When a computation is executed, the input and result will be cached in a Look Up Table. The following executions will probe the input to find out whether hit or not before calculation. And the concept is quite straightforward.

 

The Parallel Frame Rendering PFR mentioned above renders two consecutive frames in parallel, so the baseline GPU is splited into two clusters, even frames for cluster 0, odd frames to cluster 1. 50% of the redundant fragments  have distances smaller than 64 fragments,61.3% smaller than 2000.

 

There is a balance between re-use computation(memoization structures) and actual rendering computation, so comes the task-level complexity. The distance of re-use should be limited small. The fragment shaders render just a single output color, not all the details, so referential transparency can be guaranteed by monitoring the API class.

 

The memoizaton system will detect the candidate fragments and lookup of prior fragment information, so to replace the redundant components. Fragment with much information about registers and texture samplers will not be used as candidates. So not all input bits are used for generating signatures. To find the proper input, hash function generator is implemented.

 

In order to evaluate the result, the author used a mobile GPU simulation to run unmodified Android applications to get better evaluation. The OpenGL commands are redirected to GPU driver to provide hardware accelerated graphics. The GPU instruction and memory trace is used to drive the simulator.

 

2D games fit perfectly for the memoization technique, with static backgrounds,which can be easily understood. But for scrolling 2D games and 3D games, the result is not very bad, for 3D games still have some degrees of redundancy,especially the background, when the camera not move around. When the camera is moving, the optimization process works a little worse.

 

I think about the problem that paper proposed before, reduce the redundant rendering is a great way to speed up the computation, but the cost of computing the relation between the fragment must be controlled, both time and space. And the author does a lot of work to optimization this process.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值