【计算机视觉】Gaussian Splatting源码解读补充（三）

原创

已于 2024-05-23 18:26:05 修改 · 6.3k 阅读

83 ·

CC 4.0 BY-SA版权

文章标签：

#计算机视觉 #人工智能

于 2024-03-23 17:19:43 首次发布

本文是3D Gaussian Splatting源码解读的补充与错误订正，重点介绍反向传播。反向传播根据loss对像素颜色的导求Gaussian多个参数的导数，文中给出相关求导公式及代码中变量关系，还介绍一个函数完成的四件事，最后表示大体读完该源码。

本文是对学习笔记之——3D Gaussian Splatting源码解读的补充，并订正了一些错误。

五、反向传播

原文第6节：

During the backward pass, we must therefore recover the full sequence of blended points per-pixel in the forward pass. One solution would be to store arbitrarily long lists of blended points per-pixel in global memory [Kopanas et al. 2021]. To avoid the implied dynamic memory management overhead, we instead choose to traverse the per-tile lists again; we can reuse the sorted array of Gaussians and tile ranges from the forward pass. To facilitate gradient computation, we now traverse them back-to-front.

The traversal starts from the last point that affected any pixel in the tile, and loading of points into shared memory again happens collaboratively. Additionally, each pixel will only start (expensive) overlap testing and processing of points if their depth is lower than or equal to the depth of the last point that contributed to its color during the forward pass. Computation of the gradients described in Sec. 4 requires the accumulated opacity values at each step during the original blending process. Rather than trasversing an explicit list of progressively shrinking opacities in the backward pass, we can recover these intermediate opacities by storing only the total accumulated opacity at the end of the forward pass. Specifically, each point stores the final accumulated opacity 𝛼 in the forward process; we divide this by each point’s 𝛼 in our back-to-front traversal to obtain the required coefficients for gradient computation.

反向传播是根据loss对每个像素颜色的导求出loss对Gaussian的2维中心坐标、3维中心坐标、椭圆二次型矩阵、不透明度、相机看到的Gaussian颜色、球谐系数、三维协方差矩阵、缩放和旋转的导数。

代码中出现的类似于dL_dx的变量的意思是loss对参数x的导数，即 $\frac{\partial L}{\partial x}$ 。

1. `rasterizer_impl.cu`: `CudaRasterizer::Rasterizer::backward`

// Produce necessary gradients for optimization, corresponding
// to forward render pass
void CudaRasterizer::Rasterizer::backward(
	const int P, // Gaussian个数
	int D, // 球谐度数
	int M, // 三通道球谐系数个数
	int R,
		// R对应CudaRasterizer::Rasterizer::forward函数中的num_rendered
		// 即排序数组的个数（等于每个Gaussian覆盖的tile的个数之和）
	const float* background, // 背景颜色
	const int width, int height,
	const float* means3D,
	const float* shs,
	const float* colors_precomp,
	const float* scales,
	const float scale_modifier,
	const float* rotations,
	const float* cov3D_precomp,
	const float* viewmatrix,
	const float* projmatrix,
	const float* campos,
	const float tan_fovx, float tan_fovy,
	const int* radii,
	char* geom_buffer,
	char* binning_buffer,
	char* img_buffer,
	// 上面的参数不解释
	const float* dL_dpix, // loss对每个像素颜色的导数
	float* dL_dmean2D, // loss对Gaussian二维中心坐标的导数
	float* dL_dconic, // loss对椭圆二次型矩阵的导数
	float* dL_dopacity, // loss对不透明度的导数
	float* dL_dcolor, // loss对Gaussian颜色的导数（颜色是从相机中心看向Gaussian的颜色）
	float* dL_dmean3D, // loss对Gaussian三维中心坐标的导数
	float* dL_dcov3D, // loss对Gaussian三维协方差矩阵的导数
	float* dL_dsh, // loss对Gaussian的球谐系数的导数
	float* dL_dscale, // loss对Gaussian的缩放参数的导数
	float* dL_drot, // loss对Gaussian旋转四元数的导数
	bool debug)
{
   
   
	GeometryState geomState = GeometryState::fromChunk(geom_buffer, P);
	BinningState binningState = BinningState::fromChunk(binning_buffer, R);
	ImageState imgState = ImageState::fromChunk(img_buffer, width * height);
	// 上面这些缓冲区都是在前向传播的时候存下来的，现在拿出来用

	if (radii == nullptr)
	{
   
   
		radii = geomState.internal_radii;
	}

	const float focal_y = height / (2.0f * tan_fovy);
	const float focal_x = width / (2.0f * tan_fovx);

	const dim3 tile_grid((width + BLOCK_X - 1) / BLOCK_X, (height + BLOCK_Y - 1) / BLOCK_Y, 1);
	const dim3 block(BLOCK_X, BLOCK_Y, 1);

	// Compute loss gradients w.r.t. 2D mean position, conic matrix,
	// opacity and RGB of Gaussians from per-pixel loss gradients.
	// If we were given precomputed colors and not SHs, use them.
	const float* color_ptr = (colors_precomp != nullptr) ? colors_precomp : geomState.rgb;
	CHECK_CUDA(BACKWARD::render(
		tile_grid,
		block,
		imgState.ranges,
		binningState.point_list,
		width, height,
		background,
		geomState.means2D,
		geomState.conic_opacity,
		color_ptr,
		imgState.accum_alpha,
		imgState.n_contrib,
		dL_dpix,
		(float3*)dL_dmean2D,
		(float4*)dL_dconic,
		dL_dopacity,
		dL_dcolor), debug)

	// Take care of the rest of preprocessing. Was the precomputed covariance
	// given to us or a scales/rot pair? If precomputed, pass that. If not,
	// use the one we computed ourselves.
	const float* cov3D_ptr = (cov3D_precomp != nullptr) ? cov3D_precomp : geomState.cov3D;
	// 因为是反向传播，所以preprocess放在后面了(❁´◡`❁)
	CHECK_CUDA(BACKWARD::preprocess(P, D, M,
		(float3*)means3D,
		radii,
		shs,
		geomState.clamped,
		(glm::vec3*)scales,
		(glm::vec4*)rotations,
		scale_modifier,
		cov3D_ptr,
		viewmatrix,
		projmatrix,
		focal_x, focal_y,
		tan_fovx, tan_fovy,
		(glm::vec3*)campos,
		(float3*)dL_dmean2D,
		dL_dconic,
		(glm::vec3*)dL_dmean3D,
		dL_dcolor,
		dL_dcov3D,
		dL_dsh,
		(glm::vec3*)dL_dscale,
		(glm::vec4*)dL_drot), debug)
}

2. `backward.cu`: `renderCUDA`

这个函数中每个线程负责一个像素，计算loss对Gaussian的二维中心坐标、椭圆二次型矩阵、不透明度和Gaussian颜色的导。

这里求导利用的公式是颜色的递推公式，与前向传播有所不同。设 $\boldsymbol{b}_i$ 是第 $i$ 个及其后的Gaussians渲染出来的颜色（三个通道）， $\boldsymbol{g}_i$ 是第 $i$ 个Gaussian的颜色，则有递推公式： $\boldsymbol{b}_i=\alpha_i \boldsymbol{g}_i+(1-\alpha_i)\boldsymbol{b}_{i+1}$ 令 $T_i=(1-\alpha_1)(1-\alpha_2)\cdots(1-\alpha_{i-1})$