Using Dynamic Vertex and Index Buffers
2009-12-31 22:51:21| 分类: dx from msdn|字号 订阅
Locking a static vertex buffer while the graphics processor is using the buffer can have a significant performance penalty. The lock call must wait until the graphics processor is finished reading vertex or index da
锁定静态的顶点缓冲区可能带来严重的性能下降,当图形处理器正在使用这些缓冲区时。锁定操作的调用必须等待,直到图形处理器完成对顶点索引缓冲区数据的读取,这会给应用程序带来严重的延迟。每帧多次从静态缓冲区锁定和渲染也可以防止图形处理器缓存渲染指令,因为在返回锁定指针之前它必须完成这些指令。没有缓冲的指令,图形处理器保持空闲直到应用程序完成对VB或IB的填充,接着开始分发渲染指令。
Ideally the vertex or index da
理想情况下顶点或索引数据永远不变化,但是这是不可能的。许多情况下,应用程序需要在每帧中改变顶点或索引数据,甚至每帧改变多次。对于这些情况,VB或IB应该用D3DUSAGE_DYNAMIC创建。这个标志使得Direct3D针对频繁的锁定操作来做出优化。D3DUSAGE_DYNAMIC仅当缓冲区被频繁锁定时有用;保持不变的数据应该被放置在静态的VB或IB中。
To receive a performance improvement when using dynamic vertex buffers, the application must call IDirect3DVertexBuffer9::Lock orIDirect3DIndexBuffer9::Lock with the appropriate flags. D3DLOCK_DISCARD indicates that the application does not need to keep the old vertex or index da
为了在使用动态的顶点缓冲区时获取到性能的提升,应用程序必须在调用IDirect3DVertexBuffer9::Lock orIDirect3DIndexBuffer9::Lock时使用合适的标志。D3DLOCK_DISCARD告诉应用程序不必在缓冲区保存旧的顶点或索引数据。如果当用D3DLOCK_DISCARD调用锁定操作时图形处理器仍然正在使用该缓冲区,那么一个指向内存的一块新区域的指针被返回,它替代了旧的缓冲区数据。这允许图形处理器可以继续使用旧的数据,然而应用程序把他们已经放入到一个新的缓冲区了。应用程序中不需要额外的内存管理;当图形处理器对它使用完毕后,旧的缓冲区会自动被重复使用或销毁。注意:用D3DLOCK_DISCARD锁定缓冲区总是会丢弃整个缓冲区,指定一个非0的offset或有限的size并不会保存未锁定区域的信息。
There are cases where the amount of da
也存着这样的情况,应用程序在每次锁定的时候需要存储的数据的数量很小,例如对于渲染一个精灵而增加4个顶点。D3DLOCK_NOOVERWRITE告诉应用程序不要重写已经在动态缓冲区中的存在的数据。这个锁定操作的调用会返回一个指向旧的数据的指针,允许应用程序增加新的数据到未使用的区域。应用程序不应该改变正在被用于绘制操作的顶点索引,好像它们应该仍然正在被图形处理器使用一样。应用程序应该在动态缓冲区满了之后使用D3DLOCK_DISCARD来接收一块新内存,在图形处理器完成后丢弃旧的顶点或索引数据。
The asynchronous query mechanism is useful to determine if vertices are still in use by the graphics processor. Issue a query of type D3DQUERYTYPE_EVENT after the last DrawPrimitive call that uses the vertices. The vertices are no longer in use whenIDirect3DQuery9::GetData returns S_OK. Locking a buffer with D3DLOCK_DISCARD or no flags will always guarantee the vertices are synchronized properly with the graphics processor, however using lock without flags will incur the performance penalty described earlier. Other API calls such as IDirect3DDevice9::BeginScene, IDirect3DDevice9::EndScene, and IDirect3DDevice9::Present do not guarantee the graphics processor is finished using vertices.
对于决定顶点是否仍然被图形处理器使用,异步查询机制非常有用。在最后一个正在使用顶点的DrawPrimitive函数调用后分发一个类型D3DQUERYTYPE_EVENT的查询。当IDirect3DQuery9::GetData返回S_OK时顶点不再被使用。当用标志D3DLOCK_DISCARD或没有什么标志锁定一块缓冲区时总是能保证顶点和图形处理器同步,但是没有任何标志的锁定将会导致性能下降。例如IDirect3DDevice9::BeginScene, IDirect3DDevice9::EndScene, and IDirect3DDevice9::Present其它的API调用并不能保证图形处理器已经完成了顶点的使用。
Below are ways to use dynamic buffers and the proper lock flags.
// USAGE STYLE 1
// Discard the entire vertex buffer and refill with thousands of vertices.
// Might contain multiple objects and/or require multiple DrawPrimitive
// calls separated by state changes, etc.
// Determine the size of da
UINT nSizeOfData = nNumberOfVertices * m_nVertexStride;
// Discard and refill the used portion of the vertex buffer.
CONST DWORD dwLockFlags = D3DLOCK_DISCARD;
// Lock the vertex buffer.
BYTE* pBytes;
if( FAILED( m_pVertexBuffer->Lock( 0, 0, &pBytes, dwLockFlags ) ) )
return false;
// Copy the vertices into the vertex buffer.
memcpy( pBytes, pVertices, nSizeOfData );
m_pVertexBuffer->Unlock();
// Render the primitives.
m_pDevice->DrawPrimitive( D3DPT_TRIANGLELIST, 0, nNumberOfVertices/3)
// USAGE STYLE 2
// Reusing on
// Determine the size of da
UINT nSizeOfData = nNumberOfVertices * m_nVertexStride;
// No overwrite will be used if the vertices can fit into
// the space remaining in the vertex buffer.
DWORD dwLockFlags = D3DLOCK_NOOVERWRITE;
// Check to see if the entire vertex buffer has been used up yet.
if( m_nNextVertexData > m_nSizeOfVB - nSizeOfData )
{
// No space remains. Start over from the beginning
// of the vertex buffer.
dwLockFlags = D3DLOCK_DISCARD;
m_nNextVertexData = 0;
}
// Lock the vertex buffer.
BYTE* pBytes;
if( FAILED( m_pVertexBuffer->Lock( (UINT)m_nNextVertexData, nSizeOfData,
&pBytes, dwLockFlags ) ) )
return false;
// Copy the vertices into the vertex buffer.
memcpy( pBytes, pVertices, nSizeOfData );
m_pVertexBuffer->Unlock();
// Render the primitives.
m_pDevice->DrawPrimitive( D3DPT_TRIANGLELIST,
m_nNextVertexData/m_nVertexStride, nNumberOfVertices/3)
// Advance to the next position in the vertex buffer.
m_nNextVertexData += nSizeOfData;
"Static" and "dynamic" are just hints to the Direct3D driver so that it can place them in appropriate memory for maximum performance. Static buffers are uploaded to VRAM (for maximum speed) and never changed. Dynamic buffers can be uploaded into VRAM (slow to read from, but fast to render/write) or AGP ram (fast to read/write but slow to render). Direct3D can also keep two copies of dynamic vertex buffers (on
缓冲区的“静态”和“动态”正好说明了Direct3D驱动程序为了最大化性能而把它们放置到合适的存储器上。静态缓冲区被上传到VRAM(video RAM)(最大化速度),将不会改变。动态缓冲区被上传到VRAM(读取操作很慢,但是渲染/写操作很快),或者被上传到AGP RAM(读/写很快,但是渲染很慢)。Direct3D也可以保持动态缓冲区的2份拷贝(一个位于VRAM,另外一个位于AGP存储器)。和静态缓冲区不同的是,动态缓冲区可以被锁定/解锁,而且每帧都可以改变。
You are right, you should sort the vertex buffers by texture to minimize texture switches because texture switches are most expensive to the GPU.
你应该按照最小化纹理转换的方式来对顶点缓冲区排序,因为纹理转换对于CPU有最大的代价。
If by instantiating a vb you mean dynamically allocating a vertex buffer every frame, then yes, it sounds like a big performance hit. Instead, you should allocate a dynamic buffer with enough storage on
如果按照你的意思,通过对一个VB举例,就是动态地在每帧为顶点缓冲区分配内存,那么这是正确的,这听起来好像是一个很大的性能下降。相反,你应该一次性地为动态的缓冲区分配足够大的存储空间,接着使用D3D的锁定机制来在每帧中更新它。
As far as I know, batching is used in multi-threaded applications. In a multi-threaded setup there is usually on
EDIT: I almost forgot, there is also another definition for batching. Batching is a method of optimizing the GPU's triangle throughtput(吞吐量,产出率). It's done by splitting up large vertex buffers into smaller batches or organizing smaller vertex buffers into on
Index buffers are used for drawing Indexed Primitives. An index buffer stores integer values that point to the individual vertices in the current vertex buffer. It's a way to separate geometry from topology: the vertex buffer contains vertices and the index buffer contains polygons.
索引缓冲区被用于绘制被索引的图元。一个顶点缓冲区存储了指向当前顶点缓冲区的各个顶点的integer值。区分几何学和拓扑学的方法是:顶点缓冲区包含了顶点,索引缓冲区包含了多边形。
The disadvantages of an index buffer you find at lighting and texture mapping. When you use on
使用光照和纹理映射时,你会发现索引缓冲区的缺点。当你对于多个三角形使用仅仅一个顶点时,那么你也仅仅拥有一个normal向量和一对纹理坐标。你将会看到立方体的3条边上纹理被错误地映射。同样的,当你使用了光照,你也会看到仅仅立方体的2条边被正确地光照处理,因为normal向量仅仅对于每个顶点的一个三角形是正确的。