【转载】细嚼慢咽读论文：点云上采样网络开天辟地PU-Net-优快云博客

论文介绍了PU-Net，一种针对点云稀疏性问题的上采样网络，通过patch提取、特征嵌入、特征扩展和坐标重构步骤生成稠密点云。该方法借鉴PointNet++结构，解决点云均匀性和表面准确性挑战，应用于数据增强和下游任务。

论文标题：PU-Net: Point Cloud Upsampling Network

标签：有监督 | 点云上采样

首先回答一个问题：

什么是点云的上采样任务呢？

简单来说，点云上采样任务就是输入稀疏点云，输出稠密点云，同时需要保持住点云的基本形状、均匀程度等特征。

如下图所示，某上采样算法输入稀疏骆驼点云，输出稠密骆驼点云。

上采样的最主要应用就是作为一种数据增强的方式，为下游任务（比如分类、分割等）提供高质量的数据。

相关任务：点云补全、图像超分辨率

1 motivation

点云处理任务存在极大挑战，很重要的一点是点云这种数据形式的稀疏性和不规则性。

而本文要做的上采样任务，正是为了解决点云数据稀疏性这一问题，为下游各种特征学习任务提供更“高质”的数据。

点云上采样任务，简单来说就是输入某一点云，生成保持基本形状的“更稠密”点云。

上采样任务似乎有点像图像中的超分辨率任务，但似乎点云上采样会面对更多的挑战：

首先，图像空间的基本单元——像素是一个个规则的格子(regular grid)，而点云这种结构则没有任何空间顺序和规则结构。
其次，生成的点云应该能够描述目标object的根本形状，意味着生成的点能够严格落在目标object的表面上。
再次，生成的点应该是保持均匀的，而不是会聚集到一起去。

基于以上这些考量，我们不能像“图像超分辨率”那样直接使用简单的“插值”这种传统方法来做。Why not deep learning?!!!

怎么做的呢？我们接着往下看！

2 solution

以上这张图片比较好地概括了PU-Net的完整网络结构。

整个网络作为一个整体pipeline，将四个模块首尾相接拼凑而成：

Patch Extraction
Point Feature Embedding
Feature Expansion
Coordinate Reconstruction

Patch Extraction

在shape object的表面上随机选择M个中心点，对于每个中心点我们都生成一个patch（patch内任何点与中心点的测地线距离被限制在d以内），然后我们使用Poisson disk sampling（泊松盘采样）在每个patch上采样N个点，作为训练数据的groundtruth。

测地线就是在一个三维物体的表面上找出两个点的最短距离。

Point Feature Embedding

这里是一个feature提取器，对点云进行point-wise表征，从而为后面的upsampling做好准备。

PU-Net在这里参考PointNet++的网络结构，使用PointNet++ encoder-decoder的结构在不同的层次上对点表征，后面的feature expansion模块会将这些不同层次的point-wise feature拼接起来使用。

PointNet++的解析可参考之前的文章：

搞懂PointNet++，这篇文章就够了！

从代码中不难看出，基本上就是完全使用了PointNet++作为point-wise feature extractor，下面的up_l2_points, up_l3_points, up_l4_points均是后面会使用到的point-wise feature.

pointnet_sa_module函数就是pointNet++中PointNet Set Abstraction (SA) Module模块。

batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
l0_xyz = point_cloud[:,:,0:3]
if use_normal:
    l0_points = point_cloud[:,:,3:]
else:
    l0_points = None
# Layer 1
l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=num_point, radius=bradius*0.05,bn=use_bn,ibn = use_ibn,
                                                    nsample=32, mlp=[32, 32, 64], mlp2=None, group_all=False,
                                                    is_training=is_training, bn_decay=bn_decay, scope='layer1')

l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=num_point/2, radius=bradius*0.1,bn=use_bn,ibn = use_ibn,
                                                    nsample=32, mlp=[64, 64, 128], mlp2=None, group_all=False,
                                                    is_training=is_training, bn_decay=bn_decay, scope='layer2')

l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=num_point/4, radius=bradius*0.2,bn=use_bn,ibn = use_ibn,
                                                    nsample=32, mlp=[128, 128, 256], mlp2=None, group_all=False,
                                                    is_training=is_training, bn_decay=bn_decay, scope='layer3')

l4_xyz, l4_points, l4_indices = pointnet_sa_module(l3_xyz, l3_points, npoint=num_point/8, radius=bradius*0.3,bn=use_bn,ibn = use_ibn,
                                                    nsample=32, mlp=[256, 256, 512], mlp2=None, group_all=False,
                                                    is_training=is_training, bn_decay=bn_decay, scope='layer4')

# Feature Propagation layers
up_l4_points = pointnet_fp_module(l0_xyz, l4_xyz, None, l4_points, [64], is_training, bn_decay,
                                scope='fa_layer1',bn=use_bn,ibn = use_ibn)

up_l3_points = pointnet_fp_module(l0_xyz, l3_xyz, None, l3_points, [64], is_training, bn_decay,
                                scope='fa_layer2',bn=use_bn,ibn = use_ibn)

up_l2_points = pointnet_fp_module(l0_xyz, l2_xyz, None, l2_points, [64], is_training, bn_decay,
                                scope='fa_layer3',bn=use_bn,ibn = use_ibn)

Feature Expansion

现在是点的特征都有了，但是我们做的是上采样任务，关键是要增加点的数量。

基本思路其实很简单，就是在特征空间expansion point-wise feature来等价于实现增加点的数量。

令Feature Expansion模块输入为 $f$ ，维度为 $N\times \tilde{C}$ ，其中N表示输入点的数量， $\tilde{C}$ 表示concat point-wise feature，也就是对上面代码中的up_l2_points, up_l3_points, up_l4_points的拼接的结果，对应代码就是：

concat_feat = tf.concat([up_l4_points, up_l3_points, up_l2_points, l1_points, l0_xyz], axis=-1)
concat_feat = tf.expand_dims(concat_feat, axis=2)

Feature Expansion模块输出为 ${f}'$ ，维度为 $rN\times \tilde{C_{2}}$ ，其中 r 是上采样率， $\tilde{C_{2}}$ 是输出的新的point-wise feature

在图像领域，我们通常使用deconvolution或者interpolation来实现类似"upsampling"的效果。但这些方法直接用在点云上是不合适的，因为点云的排列是不规则且无序的。

本文采取了如下解决方案：

$C_{i}^{1}(\cdot )$ 和 $C_{i}^{2}(\cdot )$ 分别是两个1*1卷积，在特征空间操作。

RS（）是reshape操作，将 $N\times r\tilde{C_{2}}$ 转换成 $rN\times \tilde{C_{2}}$ .

如下图：我们将 $N\times \tilde{C}$ 复制 r 份，然后分别送入到 r 个不同的卷积操作 $C_{i}^{1}(\cdot )$ 中，再分别送入到第二组卷积 $C_{i}^{2}(\cdot )$ 中。

从第一个卷积 $C_{i}^{1}(\cdot )$ 生成的r features有很高的关联性(high correlation)，也就是生成的结果很相似，这样的话会导致最终生成的点(reconstructed 3D points)在位置上非常接近。因此作者进一步提出再加一组卷积 $C_{i}^{2}(\cdot )$ ，这样生成的features就能够更diverse。

with tf.variable_scope('up_layer',reuse=reuse):
    new_points_list = []
    for i in range(up_ratio):
        concat_feat = tf.concat([up_l4_points, up_l3_points, up_l2_points, l1_points, l0_xyz], axis=-1)
        concat_feat = tf.expand_dims(concat_feat, axis=2)
        concat_feat = tf_util2.conv2d(concat_feat, 256, [1, 1],
                                        padding='VALID', stride=[1, 1],
                                        bn=False, is_training=is_training,
                                        scope='fc_layer0_%d'%(i), bn_decay=bn_decay)

        new_points = tf_util2.conv2d(concat_feat, 128, [1, 1],
                                        padding='VALID', stride=[1, 1],
                                        bn=use_bn, is_training=is_training,
                                        scope='conv_%d' % (i),
                                        bn_decay=bn_decay)
        new_points_list.append(new_points)
    net = tf.concat(new_points_list,axis=1)

Coordinate Reconstruction

这里就比较容易了，就是直接使用一系列 $1\times 1$ 卷积将point-wise的特征维度从 $\tilde{C_{2}}$ 降至 3 ，得到reconstructed points.

# get the xyz
coord = tf_util2.conv2d(net, 64, [1, 1],
                        padding='VALID', stride=[1, 1],
                        bn=False, is_training=is_training,
                        scope='fc_layer1', bn_decay=bn_decay)

coord = tf_util2.conv2d(coord, 3, [1, 1],
                        padding='VALID', stride=[1, 1],
                        bn=False, is_training=is_training,
                        scope='fc_layer2', bn_decay=bn_decay,
                        activation_fn=None, weight_decay=0.0)  # B*(2N)*1*3
coord = tf.squeeze(coord, [2])  # B*(2N)*3

Loss

本文使用了多个loss组合在一起

loss 1. reconstruction loss

这里可供选择的loss包括EMD loss和CD loss

关于EMD loss和CD loss的介绍，参见这篇文章：

刘昕宸：点云距离度量：完全解析EMD距离(Earth Mover's Distance)zhuanlan.zhihu.com

本文选择了EMD loss，因为参考A Point Set Generation Network for 3D Object Reconstruction from a Single Image这篇paper，EMD loss能够使得输出点更接近underlying object surfaces.

loss 2. repulsion loss

生成点还是比较容易更加接近原始输入点，导致会出现若干个点堆叠在一起的情况，这样生成的上采样点云均匀性会比较差。

因此本文作者设计了repulsion loss，目的是使得各点之间的距离拉得远一些，保证生成点云的均匀性。

repulsion loss计算公式：

def get_repulsion_loss4(pred, nsample=20, radius=0.07):
    # pred: (batch_size, npoint,3)
    idx, pts_cnt = query_ball_point(radius, nsample, pred, pred)
    tf.summary.histogram('smooth/unque_index', pts_cnt)

    grouped_pred = group_point(pred, idx)  # (batch_size, npoint, nsample, 3)
    grouped_pred -= tf.expand_dims(pred, 2)

    ##get the uniform loss
    h = 0.03
    dist_square = tf.reduce_sum(grouped_pred ** 2, axis=-1)
    dist_square, idx = tf.nn.top_k(-dist_square, 5)
    dist_square = -dist_square[:, :, 1:]  # remove the first one
    dist_square = tf.maximum(1e-12,dist_square)
    dist = tf.sqrt(dist_square)
    weight = tf.exp(-dist_square/h**2)
    uniform_loss = tf.reduce_mean(radius-dist*weight)
    return uniform_loss

Joint loss

因此综合loss为：

[公式]

#get emd loss
gen_loss_emd,matchl_out = model_utils.get_emd_loss(pred, pointclouds_gt, pointclouds_radius)

#get repulsion loss
if USE_REPULSION_LOSS:
    gen_repulsion_loss = model_utils.get_repulsion_loss4(pred)
    tf.summary.scalar('loss/gen_repulsion_loss', gen_repulsion_loss)
else:
    gen_repulsion_loss =0.0

#get total loss function
pre_gen_loss = 100 * gen_loss_emd + gen_repulsion_loss + tf.losses.get_regularization_loss()