how much does a tex2D/texCUBE cost?-优快云博客

本文探讨了移动平台上纹理采样的成本与优化方法，特别是针对PowerVR GPU的特点进行了深入分析。文章指出，非依赖性纹理采样相对高效，而依赖性采样则较为昂贵，并提出了解决方案。

how much does a tex2D/texCUBE cost?
https://forum.unity.com/threads/how-much-does-a-tex2d-texcube-cost.182469/

移动平台纹理采样比较昂贵。

1. 效率

采样在移动平台相当于几个到几十个简单的数学运算，在PC上更高，一次采样相当于几个到上百个数学运算。
不建议用texture lookups

2. 某些GPU，例如PowerVR。non dependent纹理采样几乎免费，但是dependent 采样开销更大。

gpu cache 2x2 box，相邻像素采样效率更高。
dependent指这种情况：
A dependent texture read is a texture read in which the texture coordinates depend on some calculation within the (fragment) shader instead of on a varying.
As the values of this calculation cannot be known ahead of time it is not possible to pre-fetch texture data and so stalls in shader processing occur.
ps中计算uv，那么只有在ps中才能知道采样的位置，不能在ps之前(光栅化)预读取纹理
例如： tex2D(_MainTex,uv+float2(0.1,0.1)) 以及tex2D(_MainTex, uv.zw))

If your shader requires more than one dependent texture read:
When a dependent texture read occurs the current thread running on the USSE within the SGX is suspended until the comparatively slow read is completed, and another thread is swapped in. By re-arranging the code within the shader so that the two dependent texture reads were back-to-back, another performance improvement was gained. This works because the compiler can now batch these dependent texture reads together, whereas previously the shader looked as follows:
• Mathematics
• Texture Read
• Wait
• Mathematics
• Texture Read
• Wait

It now looks as follows:
• Mathematics
• Texture Read
• Texture Read
• Wait

This effectively hides some of the wait time of the first texture read behind the execution time of the second texture read. Unfortunately a wait of some description is inevitable due to the limitations of dependent texture reads. Nonetheless, the strong parallelism of the PowerVR SGX hardware ensures that these waits can be used effectively by other threads
^ But don’t forget to check the code in the compiled shader and/or use #pragma glsl_no_optimize when you use more than one dependent texture read.
也即是说ps中不计算纹理坐标，那么第一次读取纹理时，可以把相同uv的纹理连续的读取出来。

所以：
不要在ps中队uv进行数学计算，放到vs中。
不用.zw分量访问纹理（2012年powerVR GPU，不知道现在是否还存在效率问题？）