自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(27)
  • 资源 (1)
  • 收藏
  • 关注

原创 Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks

In this work we present a simple, unifified approach for estimating maps directly from monocular images using a single end- to-end deep learning architecture. For the maps themselves we adopt a semantic Bayesian occupancy grid framework, allowing us to tri

2022-03-20 20:52:10 5434 3

原创 DeepLab系列

引言DeepLab系列在语义分割领域是比较经典的模型,整个系统从v1演进到v3+,可以看到作者在各个版本间使用的改进技术,从中可以看到在语义分割领域各个子问题是如何改进&模型的技术演进思路。DeepLabV1–SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS (2015)问题&解决思路There are two technical hurdles in the app

2022-02-20 18:42:19 2100

原创 ORB SLAM2阅读笔记(一)

动机:前几篇Blog根据 高老大的《SLAM十四讲》把整体思路梳理了一下,这里阅读源码的目的是发现工程实际中遇到的问题以及解决这些问题用到的工程技巧,同时膜拜一下Raúl Mur Artal大神的经典作品参考链接:[ORB-SLAM2] ORB-SLAM中的ORB特征(提取) - 知乎一步步带你看懂orbslam2源码--orb特征点提取(二)_Mr.Sliver的博客-优快云博客_orbslam2特征点提取GitHub - electech6/ORB_SLAM2_detailed

2022-02-06 19:20:04 1038

原创 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

动机:为啥挑这篇文章,因为效果炸裂,各种改款把各种数据集霸榜了:语义分割/分类/目标检测,前10都有它Swin Transformer, that capably serves as a general-purpose backbone for computer vision.【CC】接着VIT那篇论文挖的坑,transfomer能否做为CV领域的backbone,VIT里面只做了分类的尝试,留了检测/语义分割的坑,这篇文章直接回答swin transfomer可以Transformer from

2022-02-06 16:35:45 2577

原创 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

动机:In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task’s loss. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by co

2022-02-04 18:50:33 2783 4

原创 AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

动机:While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. We show that this reliance on CNNs is not necessary and a pure transformer applied directly t

2022-02-03 16:32:44 3333

原创 EffificientDet: Scalable and Effificient Object Detection

动机:Is it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints (e.g., from 3B to 300B FLOPs)?【CC】开门见山:基于不同的算力构建一族网络We systematically study neural network archit

2022-02-02 14:52:04 1680

原创 VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

动机Can we learn a meaningful context representation directly from the structured HD maps?【CC】开宗明义提问,直接从结构化的HD MAP数据学习一个信息丰富的上下文(带动态ObjList)This paper focuses on behavior prediction in complex multi-agent systems, such as self-driving vehicles.The core i

2022-02-01 17:12:24 1174

原创 SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

动机:in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, w

2022-01-16 19:29:20 2580

原创 MOCO: Momentum Contrast for Unsupervised Visual Representation Learning

动机:Unsupervised representation learning is highly successful in natural language processing,but supervised pre-training is still dominant in computer vision. The reason may stem from differences in their respective signal spaces, Language tasks have discr

2022-01-16 15:28:38 1715

原创 NetVLAD: CNN architecture for weakly supervised place recognition

背景知识:Vector of Locally Aggregated Descriptors(VLAD)image retrieval.【CC】是广泛使用的图像提取方式,本文是在在这个提取器上做改进;具体是啥下面有介绍weakly supervised ranking loss【CC】本文的另外一个创新点是弱监督的LOSS设计,后面有介绍Place recongnition as an instance retrieval task:the query image location is estim

2021-12-26 19:12:14 1104 1

原创 An Auto-tuning Framework for Autonomous Vehicles

动机:As the scenario becomes more complicated, tuning to improve the motion planner performance becomes increasingly diffificult. To systematically solve this issue, we develop a data-driven auto-tuning framework based on the Apollo autonomous driving fram

2021-12-19 14:46:53 424

原创 TNT: Target-driveN Trajectory Prediction

概要:Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states【CC】个人理解核心思想是未来的可能性很多,但是可以通过一组预测状态来刻画。 实际就是后面的先预测一些目标点,然后基于这些点来预测轨迹 这么一个思路。这样做的好处是通过目标点来压缩搜索空间。直观来想这个事情,先规划几个目

2021-12-18 20:27:50 1059

原创 MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

背景假设:A fundamental aspect of future state prediction is that it is inherently stochastic, as agents cannot know each other’s motivations【CC】这里假设我们是无法正真知道其他agnent的行为,跟RL计算纳什均衡思路不一样,后者假设每个agent都是理智的,都想达到自己的最优解We seek a model of the future that can provide

2021-12-05 18:03:56 1223 2

转载 机器学习分布式计算概览

分布式训练的优势在于做并行化,按照并行方式,模型训练可以分为数据并行和模型并行:• 模型并行:对不同的机器,数据输入相同,运行网络模型的不同部分。• 数据并行:对不同机器,数据输入不同,运行的网络模型相同,然后将所有机器的计算结果按照某种方式合并。数据并行,各个部分相互独立,集群扩展性更好,是目前的主流方式数据并行时,每台机器上都会有一份模型梯度,因此需要将集群中多台机器的梯度累加求均值(这里是有问题的),并用得到全局Average gradients 更新权重。这里就涉及两个问题:• 什么时.

2021-11-14 18:37:25 580

原创 Multi-View 3D Object Detection Network for Autonomous Driving

动机:We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes【CC】LIDAR和Camera融合的的Dectection方案The main idea for utilizing multimodal information i

2021-11-14 17:21:50 2429

原创 3D Packing for Self-Supervised Monocular Depth Estimation

动机:propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network【CC】混合了几何特性的(就是后面的视锥模型)的自监督的单目深度估计网络创新点Our architecture leverages novel symmetrical packing and unpacking blocks to jointly learn to compress

2021-11-07 19:19:04 834

原创 3D-LaneNet+: Anchor Free Lane Detection using a Semi-Local Representation

动机:对3D-LaneNet的改进; 特点 semi-local tile representation: breaks down lanes into simple lane segments whose parameters can be learnt【CC】网格化,基于每个网格去学习Lane的特征;最后再通过NN合起来;这样天然就是Anchorfree的,并且支持不规则的/没有封闭的曲线技术点:3D-LaneNet: The first is a CNN architecture wi

2021-11-07 13:19:37 968

原创 IntentNet: Learning to Predict Intention from Raw Sensor Data

动机In this paper we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.we exploit 3D LiDAR point clouds and dynamic HD maps containing semantic elements such

2021-10-24 18:36:52 724

原创 LCDNet: Deep Loop Closure Detection and Point Cloud Registration for LiDAR SLAM

动机为解决SLAM里面回环检测问题,基于LIDAR为主sensor的网络:抛弃了传统的词袋模型,直接通过NN提取描述子,同时以OT理论为基础做数据相似性判断;另外,通过NN估计两帧间的R/t变换阵总体网络架构LCDNet is composed of a shared encoder, a place recognition head that extracts global descriptors, and a relative pose head that estimates the transf

2021-10-24 14:49:15 1074

原创 CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection

动机To exploit the radar information in this setting, radar-based features need to be mapped to the center of their corresponding object on the image, which requires an accurate association between the radar detections and objects in the scene比较naive的融合思想

2021-10-17 21:09:52 467

原创 Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net

动机本文通过一个不深的网络搞定了3D的目标检测/跟踪/预测。采用BEV的方式进行表达。猜测本文是MP3论文关于Perception部分的原型。输入:4D张量(X,Y,Z,T)输出:备忘This can result in catastrophic failures as downstream processes cannot recover from errors that appear at the beginning of the pipeline– 级联方式下后端处理模块无法修正千点模

2021-09-25 21:24:30 255

原创 MP3- A Unified Model to Map Perceive Predict and Plan

简介动机:HD map 更新不及时,还有因为定位问题导致HD 不靠谱,需要一种基于LIDAR的无HD MAP的解决方案输入: 原始的LIDAR信息 (点云时间序列) + Routing的指令(比如,turn right)特点: 可解释 – 通过可解释的中间层实现主干结构Since the LiDAR input is voxelized at 0.2 meters per pixel, C has a resolution of 0.8 meters per pixel概念介绍Drivabl

2021-09-21 18:38:49 581

原创 SLAM(三)

思维导图投影模型根据观测方程建立目标函数BA求解根据非线性优化的思想,从某个初始值开始不断地寻找下降方向 ∆x 来找到目标函数的最优解。我们把自变量定义成所有待优化的变量(位姿和路标),然后不断增量求解:目标函数变为:其中 Fij 表示相机姿态的偏导数,而 Eij 表示路标点的偏导。写成矩阵的形式如下:其中Xc为所有位姿项,Xp为所有路标项根据前面的非线性优化求解,不管的是G-N法还是L-M法都面临下面的线性方程求解:无论采用什么方法,其雅克比阵为:以G-N法为例其H取H

2021-08-22 16:28:53 208

原创 视觉SLAM(二)

思维导图数学基础-SVD奇异值分解SVD分解的目标,将m×n矩阵A分解为如下形式:其中U是m×m的矩阵,V是n×n的矩阵,U,V都是酉矩阵,Σ是m×n的对角阵酉矩阵满足:形式化理解如下:求解过程:2D-2D 根据E矩阵求解R/t数学基础-李群李代数数学基础-非线性优化3D-2D PnP3D-3D ICP参考资料https://zhuanlan.zhihu.com/p/29846048...

2021-07-28 23:27:44 403

原创 视觉SLAM(一)

思维导图视觉SLAM要解决的问题用 x 表示车辆位置,各时刻位置就记为 x1,x2…xk,构成了车辆运动轨迹。地图由许多个路标组成,而每个时刻传感器会测量到一部分路标,得到观测数据z。设路标共有 N 个用 y1,y2…yn 表示。通过运动测量 u 和传感器测量量 z 来定位 (估计x)和建图 (估计y)形式化的表述(类似卡尔曼滤波的描述):这里 u_k 是运动传感器的读数(有时也可以作为输入),w_k 为噪声。我们用一个一般函数 f 来描述运动过程,而不具体指明 f当车辆在 x_k 位置上看

2021-07-11 22:36:49 392

原创 经典论文--FCN

文章来源:文章意义&解决的问题:相关概念:文章创新点:网络结构:

2021-06-28 10:06:19 760

ORB SLAM2特征点提取

ORB SLAM2特征点提取

2022-02-06

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除