【论文阅读】【3d目标检测】Embracing Single Stride 3D Object Detector with Sparse Transformer

该博客介绍了论文《Embracing SingleStride 3D Object Detector with Sparse Transformer》的主要内容。作者探讨了在3D目标检测中去除下采样操作的可能性,并提出Single-Stride Sparse Transformer (SST)网络。通过对比实验,证明了小步距对提升检测性能的重要性,同时使用Transformer解决了小步距带来的感受野限制问题。网络结构结合了体素化、SST模块和全局注意力机制,有效地处理了点云的稀疏性。实验结果显示,SST在Waymo数据集上的表现优秀,尤其是在处理小目标时。此外,文章还讨论了Transformer如何为大物体提供足够的感受野,以及与其他方法的比较。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

论文标题:Embracing Single Stride 3D Object Detector with Sparse Transformer
源码地址:https://github.com/TuSimple/SST
CVPR2022
文章写得很好!
文章从2d 3d目标检测目标的尺寸的不同入手,在2d目标检测中确实由于图像近大远小的尺寸关系 存在着图像中物体尺寸长尾的问题:
在这里插入图片描述
如coco数据集中,大小目标往往是呈现long-tail的分布,于是很多研究者才考虑从不同scale的feature map来进行不同大小的object的预测,而对于3d目标检测来说 物体的尺寸基本是一致的,没有受到近大远小的投影关系的影响。远处的物体仅仅只是点云更加稀疏而已。作者便引出自己的思考:下采样在3d目标检测中是否是必要的呢?
因此 作者便考虑一个没有下采样的检测器,然而设计这个检测器存在着以下的一些问题:
首先设计一个在原尺寸的feature map进行操作的detector是计算量巨大的,其次原尺寸的feature map对于卷积来说往往意味着更小的感受野。
如作者首先在pointpillar上进行了一系列的实验:
对于pointpillar 作者对于backbone的stride进行改进,原来的版本记做D3,依次放大缩小stride 如 从D3到D0的backbone上四层的stride分别是:
在这里插入图片描述

因为每一个module都会最终上采样至原来的resolution,所以上述的操作只是对于中间层的feature map的size进行了改进。
在这里插入图片描述

在这里插入图片描述
我们可以看到 通过缩小stride D3到D1 ap明显得到了提升。这说明小的stride是有助于3d目标检测的。而作者对于D0 ap的下降 作者认为是过小的stride限制了感受野的大小,导致每个feature点检测到很小的一部分场景 而这部分场景是不足以检测到物体全貌的。
所以作者在这个基础上有对于D0的卷积核进行了改进:采用空洞卷积和大尺寸(55)卷积核来进行实验 发现ap也得到了提升
在这里插入图片描述
这说明了一个问题:在考虑参数量和计算时间的情况下,足够的感受野大小对于目标检测来说是十分必要的!
于是 作者便在pointpillar的基础上设计了一个single stide的网络
在这里插入图片描述
网络具体架构如上图所示,很简单的网络
首先类似pointpillar 对于点云进行体素化(实际上是pillar化),把每一个voxel作者认为是一个token。将它们输入到backbone中。与pointpillar不同 作者没有对这些编码后的voxel进行下采样 而是将它们输入到一个SST(Single-stride Sparse Transformer)的模块进行feature提取。
还是前面的问题:怎么在小stride的情况下扩大每一个voxel的感受野呢?
答案是:transformer!
transformer对于提取全局feature有着无可比拟的特性!然而 计算整个场景的transformer 显然是不现实的!
于是 作者首先对于得到的voxel图 对于voxel们进行group操作 :将voxel划分成若干个部分,随后对于每个group中的非空体素 计算他们之间的self attention:
在这里插入图片描述
还有一个问题,由于点云的稀疏性 我们不能保证每个group中非空voxel的数目保持一致 那么我们要怎样才能保证对于每一个group进行并行运算呢?
作者采用了补全的操作,补全的voxel是一个mask后的voxel 对于其他voxel没有影响。
行 那我们计算完每一个group的attention 有没有一种可能 就是我们划分的group正好有一个物体在两个group里面呢?
答案是肯定的,而且如果我们单单是group内的self attention 那么就算堆叠再多的SST也不能让这个物体的每个voxel权重得到共享 这些跨group的物体 检测效果就会很差很差
这个问题 在2d目标检测领域得到了解决:swin transformer 具体的细节是采用一个shif window 来进行不同group之间的信息交互 那么 堆叠多个SST后 跨group的voxe 便能够得到权值共享了。这里不再赘述了,感觉做的方法基本和2d是一模一样的,大家可以参考一位大佬写的博文 我也没有看这篇 best paper 的原文
这里shift的大小是group长宽的一半。
在这里插入图片描述
网络后面还有一个dense map的补全 这个操作也很好理解:点云往往是在物体表面的 物体中心往往为空 这对于检测头来说是不友好的 所以作者采用了两个3
3的卷积核来进行空洞的补全。最后的检测头直接采用了SSD。
loss如下:
在这里插入图片描述
当然 这个网络还可以扩展至二阶段 只要在后面加一个lidar RCNN即可 作者这么做是方便后面与其他方法的对比。
实验做的也很详细:
在这里插入图片描述
在这里插入图片描述
waymo上的检测效果如上图所示
在这里插入图片描述
叠加三帧的数据 作者发现该方法对于dense feature是很友好的。
对于single stride 的检测器 还有一个疑问 :感受野对于大物体是否足够呢?答案是 transformer可以解决你的顾虑!
在这里插入图片描述
在这里插入图片描述
文章通过调整更高的iou阈值发现 该方法还能实现更加精准的box预测。
在这里插入图片描述
用trans与其他采用卷积的single stride的方法进行比较
ablation做了group size voxel number的对比
在这里插入图片描述
SRA(网络深度)
在这里插入图片描述
看下来除了对于内存要求比较高(相比于pointpillar) 其他的都很优越!
在这里插入图片描述

我的思考:

文章做得很棒!通过transformer 成功实现了感受野的扩大 而这个感受野并非是传统卷积的粗暴的将一个立方体内的voxel进行linear操作 而是计算每个voxel相对于你query点的权重 如上图所示 确实能实现对于当前物体的准确识别 而不受到背景点 其他instance的影响!
文章对于小目标的检测效果很好 这是有没有进行下采样实现的结果!
与同为CVPR的voxel transformer 相比这篇文章实际上是在pillar层面进行的操作 这样的好处是可以直接运用2d目标检测的成果——swin—transformer。与它不同的地方是使用了单步距的操作。
文章启发了我们对于网络结构的思考——对于3d目标检测来说 下采样是否是有必要的!

### Fuzzy Logic Controller with Rule Viewer Implementation and Usage A fuzzy logic controller (FLC) is a form of control system that uses fuzzy logic rather than rigid binary or Boolean logic to reason about data. This approach allows the handling of imprecise information more effectively by mimicking human decision-making processes which are often based on vague or incomplete knowledge[^1]. Incorporating a rule viewer into an FLC provides visualization capabilities for understanding how rules interact within this type of system. The graphical representation helps developers debug their applications as well as communicate design choices clearly. To implement such controllers along with viewers typically involves several stages: #### Defining Inputs and Outputs The first step includes defining linguistic variables representing input/output parameters like temperature, pressure etc., alongside membership functions describing these terms mathematically through curves or shapes over continuous ranges instead of discrete values only found in traditional systems. For example: ```python import numpy as np from skfuzzy import gaussmf, gbellmf # Define universe of discourse for inputs/outputs universe_temperature = np.arange(0, 81, 1) # Membership function definitions cold = gaussmf(universe_temperature, mean=20, sigma=5) warm = gbellmf(universe_temperature, a=6, b=4, c=40) hot = gaussmf(universe_temperature, mean=60, sigma=7) ``` #### Establishing Rules Base Next comes setting up conditional statements linking antecedents (if part) with consequents (then part). These can be expressed using natural language expressions making them easier to comprehend compared to conventional programming languages syntaxes used elsewhere. Example rule set might look something similar below when written out informally: - IF Temperature IS Cold AND Humidity IS High THEN Fan Speed SHOULD BE Low. - ELSEIF Temperature IS Warm OR Humidity IS Medium THEN Fan Speed CAN VARY Between Medium & High Depending On Other Factors... These informal descriptions need conversion into machine-readable format suitable for further processing steps involved later down line during execution phase where actual decisions get made according to specified criteria outlined earlier hereunder discussion topic heading "Execution". #### Execution Phase During runtime operations after initialization has completed successfully without errors encountered previously while configuring settings beforehand; incoming sensor readings undergo fuzzification transforming crisp numerical quantities measured physically outside world environment directly surrounding target application domain area under consideration at hand moment now being discussed presently herein document text body paragraph section currently reading right away next few lines coming soon thereafter immediately following sentence ending just before starting new one beginning shortly thereupon afterwards henceforth forthwith promptly swiftly quickly rapidly almost instantly almost immediately very soon quite fast fairly rapidly relatively quickly somewhat speedily not too slowly but also certainly nowhere near instantaneously nor even close enough approximation thereof whatsoever whatsover whatever however anyhow anyway regardless nevertheless nonetheless notwithstanding despite anything else anyone could possibly say otherwise contrary contrariwise conversely inversely reversely vice versa oppositely anti-clockwise counterclockwise anticlockwise counter clockwise against all odds expectations predictions forecasts anticipations suppositions hypotheses theories conjectures speculations guesses estimations approximations rough calculations back-of-the-envelope computations napkin sketches doodles scribbles jottings notes reminders memos messages communications transmissions dispatches reports briefs summaries synopses abstracts digests condensations epitomes outlines frameworks structures scaffolds skeletons foundations bases platforms springboards launchpads jumping-off points takeoff points departure points starting points origins sources fountains springs wellsprings headwaters beginnings starts openings inceptions conceptions births creations formations establishments installations setups configurations arrangements organizations structurings orderings sequencings listings enumerations itemizations cataloguings inventories registers records logs journals diaries chronicles histories narratives stories tales accounts depictions portrayals representations presentations exhibitions demonstrations illustrations elucidations explanations clarifications interpretations translations renderings versions renditions editions issues publications releases distributions circulations propagations disseminations spread diffusions permeation penetration infiltration saturation impregnation imbuing infusion instilling endowment bestowal conferral impartation communication conveyance transmission transfer transference transportation carriage bearing carrying conveying delivering handing passing transferring transmitting transporting shipping freighting forwarding remitting sending issuing granting awarding presenting offering providing supplying furnishing affording conceding yielding allowing permitting enabling empowering authorizing licensing certifying accrediting qualifying credentialing entitling privileging favoring preferring choosing selecting electing picking taking having getting obtaining acquiring gaining winning earning meriting deserving receiving accepting welcoming embracing inviting attracting drawing pulling luring enticing tempting seducing persuading convincing coercing forcing compelling obliging binding tying chaining shackling fettering imprisoning incarcerating detaining restraining holding keeping maintaining sustaining supporting propping bolstering buttressing reinforcing strengthening fortifying hardening toughening stiffening solidifying consolidating crystallizing freezing fixing setting curing maturing ripening developing evolving progressing advancing moving forward onward ahead upstream uphill upward skyward heavenward northward southward eastward westward homeward inward outward backward behind beyond underneath beneath below lower deeper underground submerged drowned sunk buried hidden concealed covered masked veiled shrouded cloaked disguised camouflaged obscured obfuscated clouded darkened shadowed dimmed muted toned-down softened dulled blunted numbed deadened silenced hushed quieted stilled calmed pacified soothed placated appeased mollified assuaged alleviated mitigated moderated tempered tamed restrained controlled managed handled dealt-with coped-with lived-through survived endured borne tolerated put-up-with stood faced met confronted tackled
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值