RandLA-Net 主要解决两个关键问题: FPS采样耗时和下采样带来的信息丢失.
解决策略: 随机采样(基于概率的训练样本选取和随机下采样) + 局部特征编码(LFA)
RandLA-Net 亮点2——local feature aggregation (LFA):https://blog.youkuaiyun.com/qq_24505417/article/details/108982154
RandLA-Net Decoder: https://blog.youkuaiyun.com/qq_24505417/article/details/108988259
RandLA-Net训练Semantic3D数据: https://blog.youkuaiyun.com/qq_24505417/article/details/108987171
策略1. 基于概率的训练样本选取和随机下采样
点云数据的预处理工作一直都很大, 很多工作都是对点云进行滑窗+下采样的方式.滑窗方式不可避免的对点云进行了切割,使得原本的物体被切割成几份,破坏了物体的几何形状.而且对于大场景点云数据来说, 滑窗采样工作量非常大.因此RandLA-Net非常巧妙的提出了一个随机采样策略,即能保证快速采样,又能保证均匀采样整个点集. 从而避免一个物体被分成多个部分.
实施细节: 先随机赋予每个点一个概率值,然后从最小概率对应的场景点云中的最小概率点开始,以其为中心,query周边最近的N个点作为一个输入样本点集(包含中心点本身). 对这N个点进行概率更新,使其概率变大,以保证不再采样到此处的点,从而保证在整个点集上均匀采样. 如下图所示
1.预处理( data_prepare_semantic3D.py)
对原始数据(点和标签)进行网格采样, 并生成ply文件和KDTree文件. 同时保存投影信息.
对每个场景分别做数据预处理.
采样:
包括0.01降采样和0.06降采样. 将每个立方体内的点做均值,并统计该立方体内的每个类别的数量,将占比最大的类别作为采样后的类别.
# Subsample to save space
sub_points, sub_colors, sub_labels = DP.grid_sub_sampling(pc[:,:3].astype(np.float32),\
pc[:, 4:7].astype(np.uint8), labels, 0.01)
grid_size = 0.06
sub_xyz, sub_colors, sub_labels = DP.grid_sub_sampling(sub_points, sub_colors, sub_labels, grid_size)
颜色归一化处理:
sub_colors = sub_colors / 255.0
存储为ply文件:原始输入点云为(x,y,z,i,r,g,b)7维信息,网络只使用其中的坐标和颜色信息,再添加一个类别信息一起保存,即(x,y,z,r,g,b,cls),保存格式为ply.
write_ply(sub_ply_file, [sub_xyz, sub_colors, sub_labels], ['x', 'y', 'z', 'red', 'green', 'blue', 'class'])
存储KDTree:同时对0.06降采样后的点云数据构建KDTree,并保存为pkl文件.
search_tree = KDTree(sub_xyz, leaf_size=50)
kd_tree_file = join(sub_pc_folder, file_name + '_KDTree.pkl')
存储投影信息:
proj_idx = np.squeeze(search_tree.query(sub_points, return_distance=False))
proj_idx = proj_idx.astype(np.int32)
proj_save = join(sub_pc_folder, file_name + '_proj.pkl')
2. 样本集生成 (main_Semantic3D.py--->def get_batch_gen())
随机生成概率:
# Random initialize
for i, tree in enumerate(self.input_trees[split]):
self.possibility[split] += [np.random.rand(tree.data.shape[0]) * 1e-3]
self.min_possibility[split] += [float(np.min(self.possibility[split][-1]))]
最小概率场景和该场景下最小概率点:
# Choose the cloud with the lowest probability
cloud_idx = int(np.argmin(self.min_possibility[split]))
# choose the point with the minimum of possibility in the cloud as query point
point_ind = np.argmin(self.possibility[split][cloud_idx])
# Get all points within the cloud from tree structure
points = np.array(self.input_trees[split][cloud_idx].data, copy=False)
# Center point of input region
center_point = points[point_ind, :].reshape(1, -1)
搜索N个最近邻:先对中心点添加坐标扰动,然后搜索N个最近点(包含中心店自身)并打乱,然后从整个场景中取出这片点云. semantic3d数据的 N= 65536.打乱的目的是在点云随机采样的时候可以直接取前面的.
# Add noise to the center point
noise = np.random.normal(scale=cfg.noise_init / 10, size=center_point.shape)
pick_point = center_point + noise.astype(center_point.dtype)
query_idx = self.input_trees[split][cloud_idx].query(pick_point, k=cfg.num_points)[1][0]
# Shuffle index
query_idx = DP.shuffle_idx(query_idx)
# Get corresponding points and colors based on the index
queried_pc_xyz = points[query_idx]
ueried_pc_xyz[:, 0:2] = queried_pc_xyz[:, 0:2] - pick_point[:, 0:2]
queried_pc_colors = self.input_colors[split][cloud_idx][query_idx]
if split == 'test':
queried_pc_labels = np.zeros(queried_pc_xyz.shape[0])
queried_pt_weight = 1
else:
queried_pc_labels = self.input_labels[split][cloud_idx][query_idx]
queried_pc_labels = np.array([self.label_to_idx[l] for l in queried_pc_labels])
queried_pt_weight = np.array([self.class_weight[split][0][n] for n in queried_pc_labels])
更新概率: 被取出过的点会根据距离信息得到一个概率,以免再次取到此区域
# Update the possibility of the selected points
dists = np.sum(np.square((points[query_idx] - pick_point).astype(np.float32)), axis=1)
delta = np.square(1 - dists / np.max(dists)) * queried_pt_weight
self.possibility[split][cloud_idx][query_idx] += delta
self.min_possibility[split][cloud_idx] = float(np.min(self.possibility[split][cloud_idx]))
上述过程即为收集一个训练样本的过程,.再次搜寻最小概率场景和最小点,重复num_per_epoch = train_steps * batch_size次,即可得到一个epoch所需要的样本数据.
3. 为每层网络事先得到下采样点集以及插值点索引
随着网络加深,点云数量减少, 感受野会逐渐变大, 从而得到全局信息. 因此训练之前记录每个layer的输入点云坐标和他们的N个邻居点,可以提高训练速度.
- 为输入点集batch_xyz中的每一个点搜索最近点,存储索引为neigh_idx,
- 因为batch_xyz是打乱过的,所以直接取前1/4 或者1/2即可,采样后保留点集坐标为sub_points, 五个layers的 采样率为(4,4,4,4,2). 并没有采用FPS方式采样.
- up_i 为每个采样后的点寻找在采样前点集batch_xyz中的最近点. 插值是直接作为新点特征使用.与pointnet++的最近邻插值不同.
for i in range(cfg.num_layers):
neigh_idx = tf.py_func(DP.knn_search, [batch_xyz, batch_xyz, cfg.k_n], tf.int32)
sub_points = batch_xyz[:, :tf.shape(batch_xyz)[1] // cfg.sub_sampling_ratio[i], :]
pool_i = neigh_idx[:, :tf.shape(batch_xyz)[1] // cfg.sub_sampling_ratio[i], :]
up_i = tf.py_func(DP.knn_search, [sub_points, batch_xyz, 1], tf.int32)
input_points.append(batch_xyz)
input_neighbors.append(neigh_idx)
input_pools.append(pool_i)
input_up_samples.append(up_i)
batch_xyz = sub_points
网络大致结构如下:
'''
# batch_xyz (B,N,3) sub_sampling_ratio = [4, 4, 4, 4, 2] d_out = [16, 64, 128, 256, 512]
先用全连接提升到8维,然后进行5次下采样和5次上采样.
点数变化:
----->拼接第1层输出特征------>(65536,32+32)->(40960,32) 注意第1层特征用了两次
编码 输入 输出 (16384,32) ->插值-> (65536,32)
第1层:(65536,8) -> (65536,32) -> (16384,32) ----->拼接第1层输出特征------>(16384,128+32)->(16384,32)
(4096,128) ->插值-> (16384,128)
第2层:(16384,32) -> (16384,128) -> (4096,128) ----->拼接第2层输出特征------>(4096,256+128)->(4096,128)
(1024,256) ->插值-> (4096,256)
第3层:(4096,128) -> (4096,256) -> (1024,256) ----->拼接第3层输出特征------>(1024,512+256)->(1024,256)
(256,512) ->插值-> (1024,512)
第4层:(1024,256) -> (1024,512) -> (256,512) ----->拼接第4层输出特征------>(256,1024+512)->(256,512)
(128,1024) ->插值-> (256,1024)
第5层:(256,512) -> (256,1024) -> (128,1024) ->MLP-> (128,1024)
解码
解码后:
(65536,32) -> (65536,64) -> (65536,num_classes)
'''