Using Dynamic Vertex and Index Buffers

最新推荐文章于 2024-01-17 16:18:07 发布

转载最新推荐文章于 2024-01-17 16:18:07 发布 · 1k 阅读

本文详细介绍了如何在Direct3D中有效地使用动态顶点和索引缓冲区，以避免锁定静态缓冲区带来的性能损耗。讨论了如何在不同场景下选择合适的缓冲区类型和锁定标志，以及如何利用异步查询机制来优化顶点使用。通过实例展示了动态缓冲区的两种使用方式，并提供了相应的代码实现。同时，文章还对比了静态与动态缓冲区的区别，以及如何通过排序顶点缓冲区来最小化纹理切换带来的开销。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Using Dynamic Vertex and Index Buffers

2009-12-31 22:51:21| 分类： dx from msdn|字号订阅

Locking a static vertex buffer while the graphics processor is using the buffer can have a significant performance penalty. The lock call must wait until the graphics processor is finished reading vertex or index data from the buffer before it can return to the calling application, a significant delay. Locking and then rendering from a static buffer several times per frame also prevents the graphics processor from buffering rendering commands, since it must finish commands before returning the lock pointer. Without buffered commands, the graphics processor remains idle until after the application is finished filling the vertex buffer or index buffer and issues a rendering command.

锁定静态的顶点缓冲区可能带来严重的性能下降，当图形处理器正在使用这些缓冲区时。锁定操作的调用必须等待，直到图形处理器完成对顶点索引缓冲区数据的读取，这会给应用程序带来严重的延迟。每帧多次从静态缓冲区锁定和渲染也可以防止图形处理器缓存渲染指令，因为在返回锁定指针之前它必须完成这些指令。没有缓冲的指令，图形处理器保持空闲直到应用程序完成对VB或IB的填充，接着开始分发渲染指令。

Ideally the vertex or index data would never change, however this is not always possible. There are many situations where the application needs to change vertex or index data every frame, perhaps even multiple times per frame. For these situations, the vertex or index buffer should be created with D3DUSAGE_DYNAMIC. This usage flag causes Direct3D to optimize for frequent lock operations. D3DUSAGE_DYNAMIC is only useful when the buffer is locked frequently; data that remains constant should be placed in a static vertex or index buffer.

理想情况下顶点或索引数据永远不变化，但是这是不可能的。许多情况下，应用程序需要在每帧中改变顶点或索引数据，甚至每帧改变多次。对于这些情况，VB或IB应该用D3DUSAGE_DYNAMIC创建。这个标志使得Direct3D针对频繁的锁定操作来做出优化。D3DUSAGE_DYNAMIC仅当缓冲区被频繁锁定时有用；保持不变的数据应该被放置在静态的VB或IB中。

To receive a performance improvement when using dynamic vertex buffers, the application must call IDirect3DVertexBuffer9::Lock orIDirect3DIndexBuffer9::Lock with the appropriate flags. D3DLOCK_DISCARD indicates that the application does not need to keep the old vertex or index data in the buffer. If the graphics processor is still using the buffer when lock is called with D3DLOCK_DISCARD, a pointer to a new region of memory is returned instead of the old buffer data. This allows the graphics processor to continue using the old data while the application places data in the new buffer. No additional memory management is required in the application; the old buffer is reused or destroyed automatically when the graphics processor is finished with it. Note that locking a buffer with D3DLOCK_DISCARD always discards the entire buffer, specifying a nonzero offset or limited size field does not preserve information in unlocked areas of the buffer.

为了在使用动态的顶点缓冲区时获取到性能的提升，应用程序必须在调用IDirect3DVertexBuffer9::Lock orIDirect3DIndexBuffer9::Lock时使用合适的标志。D3DLOCK_DISCARD告诉应用程序不必在缓冲区保存旧的顶点或索引数据。如果当用D3DLOCK_DISCARD调用锁定操作时图形处理器仍然正在使用该缓冲区，那么一个指向内存的一块新区域的指针被返回，它替代了旧的缓冲区数据。这允许图形处理器可以继续使用旧的数据，然而应用程序把他们已经放入到一个新的缓冲区了。应用程序中不需要额外的内存管理；当图形处理器对它使用完毕后，旧的缓冲区会自动被重复使用或销毁。注意：用D3DLOCK_DISCARD锁定缓冲区总是会丢弃整个缓冲区，指定一个非0的offset或有限的size并不会保存未锁定区域的信息。

There are cases where the amount of data the application needs to store per lock is small, such as adding four vertices to render a sprite. D3DLOCK_NOOVERWRITE indicates that the application will not overwrite data already in use in the dynamic buffer. The lock call will return a pointer to the old data, allowing the application to add new data in unused regions of the vertex or index buffer. The application should not modify vertices or indices used in a draw operation as they might still be in use by the graphics processor. The application should then use D3DLOCK_DISCARD after the dynamic buffer is full to receive a new region of memory, discarding the old vertex or index data after the graphics processor is finished.

也存着这样的情况，应用程序在每次锁定的时候需要存储的数据的数量很小，例如对于渲染一个精灵而增加4个顶点。D3DLOCK_NOOVERWRITE告诉应用程序不要重写已经在动态缓冲区中的存在的数据。这个锁定操作的调用会返回一个指向旧的数据的指针，允许应用程序增加新的数据到未使用的区域。应用程序不应该改变正在被用于绘制操作的顶点索引，好像它们应该仍然正在被图形处理器使用一样。应用程序应该在动态缓冲区满了之后使用D3DLOCK_DISCARD来接收一块新内存，在图形处理器完成后丢弃旧的顶点或索引数据。

The asynchronous query mechanism is useful to determine if vertices are still in use by the graphics processor. Issue a query of type D3DQUERYTYPE_EVENT after the last DrawPrimitive call that uses the vertices. The vertices are no longer in use whenIDirect3DQuery9::GetData returns S_OK. Locking a buffer with D3DLOCK_DISCARD or no flags will always guarantee the vertices are synchronized properly with the graphics processor, however using lock without flags will incur the performance penalty described earlier. Other API calls such as IDirect3DDevice9::BeginScene, IDirect3DDevice9::EndScene, and IDirect3DDevice9::Present do not guarantee the graphics processor is finished using vertices.

对于决定顶点是否仍然被图形处理器使用，异步查询机制非常有用。在最后一个正在使用顶点的DrawPrimitive函数调用后分发一个类型D3DQUERYTYPE_EVENT的查询。当IDirect3DQuery9::GetData返回S_OK时顶点不再被使用。当用标志D3DLOCK_DISCARD或没有什么标志锁定一块缓冲区时总是能保证顶点和图形处理器同步，但是没有任何标志的锁定将会导致性能下降。例如IDirect3DDevice9::BeginScene, IDirect3DDevice9::EndScene, and IDirect3DDevice9::Present其它的API调用并不能保证图形处理器已经完成了顶点的使用。

Below are ways to use dynamic buffers and the proper lock flags.

// USAGE STYLE 1

// Discard the entire vertex buffer and refill with thousands of vertices.

// Might contain multiple objects and/or require multiple DrawPrimitive

// calls separated by state changes, etc.

// Determine the size of data to be moved into the vertex buffer.

UINT nSizeOfData = nNumberOfVertices * m_nVertexStride;

// Discard and refill the used portion of the vertex buffer.

CONST DWORD dwLockFlags = D3DLOCK_DISCARD;

// Lock the vertex buffer.

BYTE* pBytes;

if( FAILED( m_pVertexBuffer->Lock( 0, 0, &pBytes, dwLockFlags ) ) )

return false;

// Copy the vertices into the vertex buffer.

memcpy( pBytes, pVertices, nSizeOfData );

m_pVertexBuffer->Unlock();

// Render the primitives.

m_pDevice->DrawPrimitive( D3DPT_TRIANGLELIST, 0, nNumberOfVertices/3)

// USAGE STYLE 2

// Reusing one vertex buffer for multiple objects

// Determine the size of data to be moved into the vertex buffer.

UINT nSizeOfData = nNumberOfVertices * m_nVertexStride;

// No overwrite will be used if the vertices can fit into

// the space remaining in the vertex buffer.

DWORD dwLockFlags = D3DLOCK_NOOVERWRITE;

// Check to see if the entire vertex buffer has been used up yet.

if( m_nNextVertexData > m_nSizeOfVB - nSizeOfData )

{

// No space remains. Start over from the beginning

// of the vertex buffer.

dwLockFlags = D3DLOCK_DISCARD;

m_nNextVertexData = 0;

}

// Lock the vertex buffer.

BYTE* pBytes;

if( FAILED( m_pVertexBuffer->Lock( (UINT)m_nNextVertexData, nSizeOfData,

&pBytes, dwLockFlags ) ) )

return false;

// Copy the vertices into the vertex buffer.

memcpy( pBytes, pVertices, nSizeOfData );

m_pVertexBuffer->Unlock();

// Render the primitives.

m_pDevice->DrawPrimitive( D3DPT_TRIANGLELIST,

m_nNextVertexData/m_nVertexStride, nNumberOfVertices/3)

// Advance to the next position in the vertex buffer.

m_nNextVertexData += nSizeOfData;

"Static" and "dynamic" are just hints to the Direct3D driver so that it can place them in appropriate memory for maximum performance. Static buffers are uploaded to VRAM (for maximum speed) and never changed. Dynamic buffers can be uploaded into VRAM (slow to read from, but fast to render/write) or AGP ram (fast to read/write but slow to render). Direct3D can also keep two copies of dynamic vertex buffers (one in VRAM and another one in AGP memory). Unlike static buffers, dynamic buffers can be locked/unlocked and changed every frame.

缓冲区的“静态”和“动态”正好说明了Direct3D驱动程序为了最大化性能而把它们放置到合适的存储器上。静态缓冲区被上传到VRAM（video RAM）（最大化速度），将不会改变。动态缓冲区被上传到VRAM（读取操作很慢，但是渲染/写操作很快），或者被上传到AGP RAM（读/写很快，但是渲染很慢）。Direct3D也可以保持动态缓冲区的2份拷贝（一个位于VRAM，另外一个位于AGP存储器）。和静态缓冲区不同的是，动态缓冲区可以被锁定/解锁，而且每帧都可以改变。

You are right, you should sort the vertex buffers by texture to minimize texture switches because texture switches are most expensive to the GPU.
你应该按照最小化纹理转换的方式来对顶点缓冲区排序，因为纹理转换对于CPU有最大的代价。

If by instantiating a vb you mean dynamically allocating a vertex buffer every frame, then yes, it sounds like a big performance hit. Instead, you should allocate a dynamic buffer with enough storage once and then use D3D's locking mechanism to update it every frame.

如果按照你的意思，通过对一个VB举例，就是动态地在每帧为顶点缓冲区分配内存，那么这是正确的，这听起来好像是一个很大的性能下降。相反，你应该一次性地为动态的缓冲区分配足够大的存储空间，接着使用D3D的锁定机制来在每帧中更新它。
As far as I know, batching is used in multi-threaded applications. In a multi-threaded setup there is usually one render thread and multiple batch threads. The batch threads build up D3D command buffers and when it's time to draw everything the render thread executes all the recorded batch commands from each thread.

EDIT: I almost forgot, there is also another definition for batching. Batching is a method of optimizing the GPU's triangle throughtput（吞吐量，产出率）. It's done by splitting up large vertex buffers into smaller batches or organizing smaller vertex buffers into one large batch.

Index buffers are used for drawing Indexed Primitives. An index buffer stores integer values that point to the individual vertices in the current vertex buffer. It's a way to separate geometry from topology: the vertex buffer contains vertices and the index buffer contains polygons.

索引缓冲区被用于绘制被索引的图元。一个顶点缓冲区存储了指向当前顶点缓冲区的各个顶点的integer值。区分几何学和拓扑学的方法是：顶点缓冲区包含了顶点，索引缓冲区包含了多边形。

The disadvantages of an index buffer you find at lighting and texture mapping. When you use only one vertex for several triangles, you also have only one normal vector and one pair of texture coordinates for it. You will see, that on three sides of the cube the texture is mapped wrong. Also when you want to use lighting, you will see, that only two sides of the cube are lit correctly, because the normal vector can be right only for one triangle per vertex.

使用光照和纹理映射时，你会发现索引缓冲区的缺点。当你对于多个三角形使用仅仅一个顶点时，那么你也仅仅拥有一个normal向量和一对纹理坐标。你将会看到立方体的3条边上纹理被错误地映射。同样的，当你使用了光照，你也会看到仅仅立方体的2条边被正确地光照处理，因为normal向量仅仅对于每个顶点的一个三角形是正确的。