3D (Input) Sparse Convolution

本文探讨了深度学习中2D稀疏权重的特性,与3D点云中因数据高维和结构不规则导致的内存效率问题。重点介绍了Submanifold Sparse Convolutional Networks (SSCNs)的方法,它通过减少内存消耗和结构化稀疏性来优化3D稀疏卷积运算,包括SC和SSC操作。此外,文章还提到了Minkowski CNN和Minkowski Engine在处理高维稀疏数据中的应用,以及针对点云的TorchSparse库加速技术。

Review:

2D Weight sparsity in DNNs

(Weight) Sparsity in Deep Learning_EverNoob的博客-优快云博客

==> the above mentioned 2D sparsity is decidedly different from the 3D input sparsity scenario, in that we manually created the structured sparse weight to cut down memory footprint, while the 3D situation is caused by unstructured nature of point cloud or just the nature of high dimensionality;

====> the long and short of the unstructured sparse point cloud is: if we keep using dense operations, which the general purposed hardwares are poised to perform, we have to waste a LOT of time and energy on moving and computing 0s. 

==> just check how we want the data to be structured for parallelled hardwares and how we exploited the structued sparsity for performance gains;

==> the main goal or method for 3D sparse convolution acceleration is hence to:

reduce memory footprint;

structurally format the sparsity (for efficient parallel processing)

Submanifold Sparse Convolutional Networks

the seminal paper that introduced the base method for currently popular sparse 3D convolution acceleration solutions:

​​​​​​https://arxiv.org/abs/1711.10275

[Submitted on 28 Nov 2017]

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

Benjamin GrahamMartin EngelckeLaurens van der Maaten

Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SSCNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.

key idea/purpose

One of the downsides of prior sparse implementations of convolutional networks is that they “dilate” the sparse data in every layer by applying “full” convolutions. In this work, we show that it is possible to create convolutional networks that keep the same level of sparsity throughout the network. To this end, we develop a new implementation for performing sparse convolutions (SCs) and introduce a novel convolution operator termed submanifold sparse convolution (SSC).1 We use these operators as the basis for submanifold sparse convolutional networks (SSCNs) that are optimized for efficient semantic segmentation of 3D point clouds

Definitions and Spatial Sparsity

We define a d-dimensional convolutional network as a network that takes as

input a (d + 1)-dimensional tensor: the input tensor contains d spatio-temporal dimensions (such as length, width, height, time, etc.) and one additional feature-space dimension (e.g., RGB color channels or surface normal vectors).

The input corresponds to a d-dimensional grid of sites, each of which is associated with a feature vector.

We define a site in the input to be active if any element in the feature vector is not in its ground state, e.g., if it is non-zero3.

In many problems, thresholding may be used to eliminate input sites at which the feature vector is within a small distance from the ground state.

Note that even though the input tensor is (d + 1)- dimensional, activity is a d-dimensional phenomenon: entire lines along the feature dimension are either active or inactive ==> e.g. a point either exists or not in point cloud, so, naturally, does its feature vector.

Similarly, the hidden layers of a d-dimensional convolutional network are represented by d-dimensional grids of feature-space vectors. a site in a hidden layer is active if any of the sites in the layer that it takes as input is active state.

The value of the ground state only needs to be calculated once per forward pass at training time, and only once for all forward passes at test time. This allows for substantial savings in computational and memory requirements; the exact savings depend on data sparsity and network depth.

"we argue that the framework described above is unduly restrictive"

Submanifold Dilation

If the input data contains a single a

### 使用稀疏点体素卷积的高效3D架构搜索方法 #### 方法概述 稀疏点体素卷积(Sparse Point-Voxel Convolution, SPVConv)是一种高效的3D模块,旨在解决传统稀疏卷积无法维持高分辨率表示以及点体素卷积难以扩展至大规模3D场景的问题[^1]。通过结合SPVConv与自动化神经架构搜索(Neural Architecture Search, NAS),可以构建出既轻量又高性能的3D模型。 #### 自动化神经架构搜索流程 3D-NAS 是一种专门针对3D场景理解任务设计的模型搜索框架。它利用进化算法(Evolutionary Algorithm)来自动发现最优网络结构[^2]。具体而言,整个过程如下: - **初始化种群**:从随机生成的一组候选网络开始。 - **评估与选择**:在每次迭代中,对当前种群内的所有候选网络进行性能评估,并挑选出表现最好的前\( k \)个模型作为父代个体。 - **繁殖下一代**: - 对于突变操作,从顶级候选人中随机选取一个样本,并按照预定概率修改其某些结构参数(如通道数、网络深度等)[^3]。 - 而交叉操作则是从前\( k \)名候选人里选出两份蓝图,再将其特征随机组合成一个新的设计方案。 - **资源约束验证**:确保新生后代皆符合设定好的预算条件;若有违反,则重新取样直至合格为止。 最终,在经历若干轮次演化之后,会从最后一轮剩余成员当中甄选最佳方案作为目标输出。 #### 实验环境配置 为了便于实际应用开发测试工作开展起来更加顺利便捷一些列准备工作必不可少其中包括但不限于安装必要的依赖库准备相应的硬件设施等等下面给出一段简单的Python脚本示范如何加载预训练权重完成基本预测任务: ```python import torch from spvnas.models.spvcnn import SPVCNN device = 'cuda' if torch.cuda.is_available() else 'cpu' model = SPVCNN(num_classes=20).to(device) checkpoint = torch.load('path_to_checkpoint.pth', map_location=device) model.load_state_dict(checkpoint['state_dict']) model.eval() # Example input data preparation omitted here. with torch.no_grad(): outputs = model(inputs.to(device)) ``` 此外还提供了多个公开可用的数据集支持比如SemanticKITTI可用于进一步训练或微调现有模型实例提升特定应用场景下的效果表现[^4]. ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值