Image Processing
prepare
- loadAssets:加载贴图
- generateQuad: 生成一个带UV的画板模型
- setupVertexDescriptions: 绑定画板模型资源
- prepareUniformBuffers
- createBuffer
- updateUniformBuffers
- prepareTextureTarget: 为 compute shader 计算结果准备一个贴图作为容器
- vkCreateImage
- vkGetImageMemoryRequirements
- vkAllocateMemory
- vkBindImageMemory
- createCommandBuffer
- setImageLayout
- flushCommandBuffer
- vkCreateSampler
- vkCreateImageView
- setupDescriptorSetLayout
- vkCreateDescriptorSetLayout
- vkCreatePipelineLayout
- preparePipelines
- vkCreateGraphicsPipelines:Rendering pipeline
- setupDescriptorPool
- setupDescriptorSet :
- Input image (before compute post processing)
- vkUpdateDescriptorSets
- Final image (after compute shader processing)
- vkUpdateDescriptorSets
- Input image (before compute post processing)
- prepareGraphics
- vkCreateSemaphore: 创建信号, 用于计算与图形管线同步
- prepareCompute: 准备compute shader 资源
- vkGetDeviceQueue: 创建一个compute queue
- Create compute pipeline: 注意: computer shader 创建一定要与图形管线创建分开创建
- VkDescriptorSetLayoutBinding
- Binding 0: Input image (read-only)
- Binding 1: Output image (write)
- vkCreateDescriptorSetLayout
- vkCreatePipelineLayout
- vkAllocateDescriptorSets
- vkUpdateDescriptorSets
- Create compute shader pipelines
- vkCreateComputePipelines: 注意这个函数, 是为compute shader 专业的创建函数
- vkCreateCommandPool
- vkAllocateCommandBuffers
- vkCreateSemaphore
- vkQueueSubmit
- vkQueueWaitIdle
- buildComputeCommandBuffer
- vkQueueWaitIdle
- vkBeginCommandBuffer
- vkCmdBindPipeline
- vkCmdBindDescriptorSets
- vkCmdDispatch: 工作组发送到计算管线
- vkEndCommandBuffer
- VkDescriptorSetLayoutBinding
- buildCommandBuffers:
- loop
- vkBeginCommandBuffer
- vkCmdPipelineBarrier
- compute shader 计算完成的图片需要在pass开始之前 Barrier 到管线
- vkCmdBeginRenderPass
- vkCmdSetViewport
- vkCmdSetScissor
- vkCmdBindVertexBuffers
- vkCmdBindIndexBuffer
- Left (pre compute): 显示的左边管线
- vkCmdBindDescriptorSets
- vkCmdBindPipeline
- vkCmdDrawIndexed
- Right (post compute): 显示在右边的使用compute shader的管线
- vkCmdBindDescriptorSets
- vkCmdBindPipeline
- vkCmdSetViewport
- vkCmdDrawIndexed
- drawUI
- vkCmdEndRenderPass
- vkEndCommandBuffer
- loop
render
- draw
- Submit graphics commands
- submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
- 提交渲染为管线渲染,这里没有走computeshader
- vkQueueSubmit
- submitInfo.pCommandBuffers = &drawCmdBuffers[currentBuffer];
- submitFrame
- Wait for rendering finished
- Submit compute commands
- computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
- 这里绑定的了compute的buffer 走computeshader管线
- computeSubmitInfo.pCommandBuffers = &compute.commandBuffer;
- vkQueueSubmit
- Submit graphics commands
- updateUniformBuffers
shader
-
emboss
- layout (local_size_x = 16, local_size_y = 16) in;
- 本地工作组16*16
- vkCmdDispatch(compute.commandBuffer, textureComputeTarget.width / 16, textureComputeTarget.height / 16, 1);
- 每个维度全局工作组尺寸为宽高/本地工作组维度size
- 因此一张贴图得每个像素对应的工作组坐标为gl_GlobalInvocationID.xy
- 公式为: gl_LocalInvocationID=gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
- 因此写图使用以下代码
- imageStore(resultImage, ivec2(gl_GlobalInvocationID.xy), res);
- 只读权限uniform: 需要注意的是这里使用uniform而不是sampler
- layout (binding = 0, rgba8) uniform readonly image2D inputImage;
- 使用imageLoad函数获取像素, uv坐标 = ivec2(gl_GlobalInvocationID.x , gl_GlobalInvocationID.y )
- 输出图片
- layout (binding = 1, rgba8) uniform image2D resultImage;
-使用imageStore()存储
- layout (binding = 1, rgba8) uniform image2D resultImage;
- layout (local_size_x = 16, local_size_y = 16) in;
-
本节三个计算着色器代码结构相同, 只是具体使用算法不同
-
VS:自认为是显示表示输出顶点
out gl_PerVertex
{
vec4 gl_Position;
};
Compute Shader
注: 引用 https://blog.youkuaiyun.com/panda1234lee/article/details/51777980
- glDispatchCompute: 把工作组发送到计算管线上
- glDispatchComputeIndirect: 使用存储在缓冲区对象上的参数来发送计算任务
- GL_MAX_COMPUTE_WORK_GROUP_SIZE
- 计算着色器中执行单元的总数是: N维数组大小乘以着色器定义的本地工作组的大小
- 计算着色器可以访问所有其他着色器能访问的资源
- uvec3 gl_WorkGroupSize: 本地工作组大小的常数(local_size_x,local_size_y, local_size_x,local_sizez)
- gl_NumWorkGroups: 向量(num_groups_x,num_groups_y和 num_groups_z)
- gl_LocalInvocationID : 当前执行单元在本地工作组中的位置
- gl_WorkGroupID :本地工作组在全局工作组中的位置
- gl_GlobalInvocationID : 由gl_LocalInvocationID、gl_WorkGroupSize和gl_WorkGroupID派生而来
- gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
- 它是当前执行单元在全局工作组中的位置的一种有效的3维索引
- gl_LocalInvocationIndex:
- gl_LocalInvocationID.zgl_WorkGroupSize.xgl_WorkGroupSize.y+gl_LocalInvocationID.y * gl_WorkGroupSize.x + gl_LocalInvocationID.x.
- 用1维的索引来代表2维或3维的数据。
- shared关键字: 设置共享变量, 对同一个本地工作组内的所有计算着色器请求可见
- 通常访问共享shared变量的性能会远远好于访问图像或者着色器存储缓存(shader storage buffer)(例如主内存)的性能
- GL_MAX_COMPUTE_SHARED_MEMORY_SIZE
- 同步
- 运行屏障( execution barrier)
- 内存屏障( memory barrier)
- roupMemoryBarrier()
小结
本节主要讲compute shader基础. 引入工作组的概念. 虽然这是第一次接触compute shader. 但是感觉计算管线的设置与CUDA程序很像. 因为有CUDA编程基础, 所以还是很容易上手的.
GPU Particle system
prepare
- graphics.queueFamilyIndex
- compute.queueFamilyIndex
- 前两个的参数设置为了区别计算队列与图形队列
- loadAssets:加载粒子使用的贴图
- setupDescriptorPool
- 这里多设置了一个池: VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
- vkCreateDescriptorPool
- prepareGraphics
- prepareStorageBuffers
- std::vector particleBuffer(PARTICLE_COUNT);
- 存放随机产生的粒子(代属性)
- createBuffer: 创建stagingBuffer用于存放粒子
- createBuffer: 注意这个buffer的创建使用的参数(计算管线的stagingbuffer, vs 的vbo)
- createCommandBuffer
- vkCmdCopyBuffer
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- vkCmdPipelineBarrier: 如果现在管线是计算管线, 将vbo数据拷贝到计算管线
- flushCommandBuffer
- destroy
- Binding description
- 将绑定初始的粒子顶点所需描述符
- std::vector particleBuffer(PARTICLE_COUNT);
- prepareUniformBuffers: 主要是计算管线uniform
- createBuffer
- updateUniformBuffers
- setupDescriptorSetLayout
- vkCreateDescriptorSetLayout
- pushConstantRange
- vkCreatePipelineLayout
- preparePipelines
- vkCreateGraphicsPipelines
- setupDescriptorSet
- vkAllocateDescriptorSets
- vkUpdateDescriptorSets
- vkCreateSemaphore
- prepareStorageBuffers
- prepareCompute
- vkGetDeviceQueue
- VkDescriptorSetLayoutBinding
- descriptorSetLayoutBinding: Particle position storage buffer
- descriptorSetLayoutBinding: Uniform buffer
- vkCreateDescriptorSetLayout
- vkCreatePipelineLayout
- vkAllocateDescriptorSets
- VkWriteDescriptorSet: 这里多了一个write描述符
- writeDescriptorSet: Particle position storage buffer
- writeDescriptorSet: Uniform buffer
- vkUpdateDescriptorSets
- vkCreateComputePipelines
- vkCreateCommandPool
- createCommandBuffer
- vkCreateSemaphore
- vkQueueSubmit
- vkQueueWaitIdle
- buildComputeCommandBuffer
- vkBeginCommandBuffer
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- vkCmdPipelineBarrier:将stagingbuffer写入的计算管线中
- vkCmdBindPipeline
- vkCmdBindDescriptorSets
- vkCmdDispatch(PARTICLE_COUNT / 256) 一维工作组
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- 将计算管线数据写入到VS管线中
- vkEndCommandBuffer
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- vkCmdPipelineBarrier: 将vs数据Barrier到计算管线中
- vkCmdPipelineBarrier: 将计算管线数据Barrier到VS管线中
- flushCommandBuffer
- buildCommandBuffers
- loop
- vkBeginCommandBuffer
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- vkCmdPipelineBarrier: CS复制到VS中
- vkCmdBeginRenderPass
- vkCmdSetViewport
- vkCmdSetScissor
- vkCmdBindPipeline
- vkCmdBindDescriptorSets
- vkCmdPushConstants
- vkCmdBindVertexBuffers
- vkCmdDraw
- drawUI
- vkCmdEndRenderPass
- graphics.queueFamilyIndex != compute.queueFamilyIndex
- vkCmdPipelineBarrier: VS复制到CS中
- vkEndCommandBuffer
- loop
render
- draw
- prepareFrame
- vkQueueSubmit: Submit graphics commands
- submitFrame
- vkQueueSubmit:Submit compute commands
- updateUniformBuffers
shader
- CS:
//绑定buffer, 用于存储当前粒子信息 // Binding 0 : Position storage buffer layout(std140, binding = 0) buffer Pos { Particle particles[ ]; }; //工作执行单元为256 layout (local_size_x = 256) in;
- gl_GlobalInvocationID即操作粒子的索引
- particles[index].vel.xy = vVel: 只需直接修改storagebuffer信息即可
- VS:
- layout (location = 0) in vec2 inPos;
- layout (location = 1) in vec4 inGradientPos;
- prepareStorageBuffers:中绑定了需要传入vs的storagebuffer内容
- gl_PerVertex:中新增 gl_PointSize
- FS: 正常渲染即可
小结
本节主要讲compute shader storage buffer的使用. 需重点看一下storage buffer在图形管线和计算管线的切换.