MALI Tile-based rendering简单原理介绍

本文对比了传统GPU与Mali的Tile-based GPU架构,详细分析了两种架构的优缺点。传统GPU在处理大图形时面临高带宽和能耗问题,而Mali通过分块渲染减少了DDR读写,有效降低带宽需求和能耗。
部署运行你感兴趣的模型镜像

写在最前

关于mali的架构的一点深入了解,将现有的GPU的基本流程和mali的做对比,提出其中的优点与缺点。原文地址:https://developer.arm.com/graphics/developer-guides/tile-based-rendering

传统GPU

传统GPU的架构一般被称为Immediate mode GPU.主要的流程就是vertex shader 和 fragment shader顺序执行,伪代码如下:

for draw in renderPass:
    for primitive in draw:
        for vertex in primitive:
            execute_vertex_shader(vertex)
        for fragment in primitive:
            execute_fragment_shader(fragment)

数据流是这样的:

传统GPU

优点

主要优点就是vertex的输出能够留在片上,可以被下一阶段直接快速读取。

缺点

如果有很大的图形(主要是三角形)需要被渲染,那framebuffer就会很大,比如对于整个屏幕的颜色渲染或者深度渲染就会消耗很多存储资源,但是片上是没有这么多资源的,因此就要频繁读取DDR。很多和当前frame有关的操作( 比如blending, depth testing 或者 stencil testing)都需要读取这个working set,因此需要的带宽是很大的,并且这样能耗也很高,对于移动设备来说,这种方式很不利于设备运行。

Tile-based GPU

因此mali的GPU提出了Tile-based概念,就是将图像分成16*16的小块。分小块进行渲染,最后写入到DDR,就能够减少读写DDR的频率,进而解决上述问题。不过分块需要知道整个图像的几何学信息,所以操作分成了两步:

  1. 第一步执行几何学相关的操作,并产生tile list.
  2. 第二步对每一个tile执行fragment操作,完成之后写入内存

伪代码如下:

# Pass one
for draw in renderPass:
    for primitive in draw:
        for vertex in primitive:
            execute_vertex_shader(vertex)
        append_tile_list(primitive)

# Pass two
for tile in renderPass:
    for primitive in tile:
        for fragment in primitive:
            execute_fragment_shader(fragment)

数据流如下:

maliGPU

优点

显而易见,解决了传统模型的带宽问题,因为fragment shader每次都是读取一个小块放在片上,不需要频繁读取内存,直到最后操作完成,再写入内存。甚至还能够通过压缩tile的方法进一步减少对于内存的读写。另外在图像有一些区域固定不动的时候,通过调用函数判断tile是否相同,减少重复的渲染。

缺点

这个操作需要在vertex阶段之后,将输出的几何数据写入到DDR,然后才被fragment shader读取。这之间也就是tile写入DDR的开销和fragment shader渲染读取DDR开销的平衡。另外还有一些操作(比如tessellation)也不适用于Tile-based GPU。

总结

现在屏显的分辨率越来越大从1080p到1440p再到4K,可以遇见的,mali这种架构将在未来大规模使用。

不过也有一些陷阱,开发者需要避开。首先是要合理设置render pass以充分利用这种架构的特点;其次要了解这种几何学分割所能得到的好处。

您可能感兴趣的与本文相关的镜像

Wan2.2-I2V-A14B

Wan2.2-I2V-A14B

图生视频
Wan2.2

Wan2.2是由通义万相开源高效文本到视频生成模型,是有​50亿参数的轻量级视频生成模型,专为快速内容创作优化。支持480P视频生成,具备优秀的时序连贯性和运动推理能力

Back in 2014 . Since then, many things have happened that have transformed mobile Graphics, particularly the release of the first Vulkan version in February 2016. Built from the ground up, Vulkan was intended to replace OpenGL as the main Graphics API, after OpenGL had successfully served the industry for more than 20 years. The new Graphics API was expected to provide a set of benefits across multiple platforms that the graphics community recognizes and values today.\n\nAs expected, the transition from OpenGL to Vulkan is taking several years. Although today the default API in the main game engines is Vulkan, all of them still support OpenGL ES 3.x for Android. OpenGL ES 3.1 brought compute to mobile graphics, and OpenGL ES 3.2 added the Android Extension Pack, bringing the mobile API's functionality significantly closer to its desktop counterpart – OpenGL. OpenGL ES 3.2 is supported in Android 6.0 and higher if the device itself supports this graphics pipeline, which is reflected in the high use of OpenGL ES by game developers.\n\nPart 1 in this blog series explores the PLS extension from today’s perspective, but also from when it was launched. We highlight the main benefits it introduced, so developers can make the most of it when coding their games using OpenGL ES. Some representative examples are described and links to relevant publications and presentations are provided.\n\nWhy was Pixel Local Storage needed?\nSaving power is always in the minds of mobile game developers. Reducing power usage while gaming enables battery savings and makes the gaming experience last longer. The power-saving benefit is what makes PLS so important for game developers using OpenGL ES to implement their games. To understand this, we first need to look at how PLS works.\n\nPLS takes advantage of the Mali GPU tile architecture. It is the concept of tile-based rendering that allows Mali GPUs to keep rendering power consumption low. Mali GPUs break up the screen into small regions of 16x16 or 32x32 pixels known as tiles. Rendering takes place in two passes (see Fig. 1 below). The first pass builds the list of geometric primitives that fall into each tile. In the second pass, each shader core executes the fragment shading tile-by-tile and writes tiles back to memory as they have been completed.\n\nDuring shading, due to the small size of the tile, it is possible to keep the whole set of working data (color, depth, and stencil) in an on-chip RAM within the shader core. This RAM is fast and tightly coupled to the GPU shader core. This allows saving valuable bandwidth and thus power.
最新发布
06-06
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值