Vision-based Robotic Grasping: Papers and Codes
According to the kinds of grasp, the methods of vision-based robotic grasping can be roughly divided into two kinds, 2D planar grasp and 6DoF Grasp. This repository summaries these methods in recent years, which utilize deep learning mostly. Before this summary, previous review papers are also reviewed.
0. Review Papers
[arXiv] 2019-Deep Learning for 3D Point Clouds: A Survey, [paper]
[arXiv] 2019-A Review of Robot Learning for Manipulation- Challenges, Representations, and Algorithms, [paper]
[arXiv] 2019-Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review, [paper]
[MTI] 2018-Review of Deep Learning Methods in Robotic Grasp Detection, [paper]
[ToR] 2016-Data-Driven Grasp Synthesis - A Survey, [paper]
[RAS] 2012-An overview of 3D object grasp synthesis algorithms - A Survey, [paper]
1. 2D Planar Grasp
Grasp Representation: The grasp is represented as an oriented 2D box, and the grasp is constrained from one direction.
1.1 RGB or RGB-D based methods
This kind of methods directly regress the oriented 2D box from RGB or RGB-D images. When using RGB-D images, the depth image is regarded as an another channel, which is similar with RGB-based methods.
2020:
[arXiv] Online Self-Supervised Learning for Object Picking: Detecting Optimum Grasping Position using a Metric Learning Approach, [paper]
[arXiv] A Multi-task Learning Framework for Grasping-Position Detection and Few-Shot Classification, [paper]
[arXiv] Rigid-Soft Interactive Learning for Robust Grasping*, [paper]
[arXiv] Optimizing Correlated Graspability Score and Grasp Regression for Better Grasp Prediction, [paper]
[arXiv] Domain Independent Unsupervised Learning to grasp the Novel Objects, [paper]
[arXiv] Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered Environment, [paper]
[arXiv] Semi-supervised Grasp Detection by Representation Learning in a Vector Quantized Latent Space, [paper]
2019:
[arXiv] Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network, [paper]
[IROS] Domain Independent Unsupervised Learning to grasp the Novel Objects, [paper]
[Sensors] Vision for Robust Robot Manipulation, [paper]
[arXiv] Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly, [paper] [code]
[IROS] GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments, [paper]
[arXiv] Efficient Fully Convolution Neural Network for Generating Pixel Wise Robotic Grasps With High Resolution Images, [paper]
[arXiv] A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection, [paper]
[IROS] ROI-based Robotic Grasp Detection for Object Overlapping Scenes, [paper]
[IROS] SilhoNet: An RGB Method for 6D Object Pose Estimation, [paper]
[ICRA] Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter, [paper] [code]
2018:
[arXiv] Real-Time, Highly Accurate Robotic Grasp Detection using Fully Convolutional Neural Networks with High-Resolution Images, [paper]
[arXiv] Real-world Multi-object, Multi-grasp Detection, [paper]
[ICRA] Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching, [paper] [code]
2017:
[IROS] Robotic Grasp Detection using Deep Convolutional Neural Networks, [paper]
2016:
[ICRA] Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours, [paper]
2015:
[ICRA] Real-time grasp detection using convolutional neural networks, [paper] [code]
2014:
[IJRR] Deep Learning for Detecting Robotic Grasps, [paper]
Datasets:
Cornell dataset, the dataset consists of 1035 images of 280 different objects.
1.2 Depth-based methods
This kind of methods utilized an indirectly way to obtain the grasp pose, which contains grasp candidate generation and grasp quality evaluation. The candidate grasp with the highly score will be selected as the final grasp.
2019:
[IROS] GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier, [paper]
[ICRA] Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter, [paper]
[ICRA] MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network, [paper]
[IROS] GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter, [paper]
2018:
[RSS] Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach, [paper]
[BMVC] EnsembleNet Improving Grasp Detection using an Ensemble of Convolutional Neural Networks, [paper]
2017:
[RSS] Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics, [paper] [code]
Dataset:
Dex-Net, a synthetic dataset of 6.7 million point clouds, grasps, and robust analytic grasp metrics generated from thousands of 3D models.
Jacquard Dataset, Jacquard: A Large Scale Dataset for Robotic Grasp Detection” in IEEE International Conference on Intelligent Robots and Systems, 2018, [paper]
1.3 Target object localization in 2D
In order to provide a better input to compute the oriented 2D box, or generate the candidates, the targe object's mask should be computed. The current deep learning-based 2D detection or 2D segmentation methods could assist.
1.3.1 2D detection:
Detailed paper lists can refer to hoya012 or amusi.
Survey papers
2020:
[arXiv] Deep Domain Adaptive Object Detection: a Survey, [paper]
[IJCV] Deep Learning for Generic Object Detection: A Survey, [paper]
2019:
[arXiv] Object Detection in 20 Years A Survey, [paper]
[arXiv] Object Detection with Deep Learning: A Review, [paper]
[arXiv] A Review of Object Detection Models based on Convolutional Neural Network, [paper]
[arXiv] A Review of methods for Textureless Object Recognition, [paper]
a. Two-stage methods
2020:
[arXiv] Any-Shot Object Detection, [paper]
[arXiv] Frustratingly Simple Few-Shot Object Detection, [paper]
[arXiv] Rethinking the Route Towards Weakly Supervised Object Localization, [paper]
[arXiv] Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN, [paper]
[arXiv] Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images, [paper]
[arXiv] PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation, [paper]
[arXiv] SpotNet: Self-Attention Multi-Task Network for Object Detection, [paper]
[arXiv] Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning, [paper]
[arXiv] FedVision: An Online Visual Object Detection Platform Powered by Federated Learning, [paper]
2019:
[arXiv] Combining Deep Learning and Verification for Precise Object Instance Detection, [paper]
[arXiv] cmSalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks, [paper]
[arXiv] OpenLORIS-Object: A Dataset and Benchmark towards Lifelong Object Recognition, [paper] [project]
[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]
[IROS] Recurrent Convolutional Fusion for RGB-D Object Recognition, [paper] [code]
[ICCVW] An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection, [paper]
2017:
[arXiv] Light-Head R-CNN: In Defense of Two-Stage Object Detector, [paper] [code]
2016:
[NeurIPS] R-FCN: Object Detection via Region-based Fully Convolutional Networks, [paper] [code]
[TPAMI] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, [paper] [code]
[ECCV] Visual relationship detection with language priors, [paper]
2015:
[ICCV] Fast R-CNN, [paper] [code]
2014:
[ECCV] SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, [paper] [code]
[CVPR] R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation, [paper] [code]
b. Single-stage methods
2020:
[arXiv] CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection, [paper]
[arXiv] Extended Feature Pyramid Network for Small Object Detection, [paper]
[arXiv] Real Time Detection of Small Objects, [paper]
[arXiv] OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features, [paper]
2019:
[arXiv] CenterNet: Objects as Points, [paper]
[arXiv] CenterNet: Keypoint Triplets for Object Detection, [paper]
[ECCV] CornerNet: Detecting Objects as Paired Keypoints, [paper]
[arXiv] FCOS: Fully Convolutional One-Stage Object Detection, [paper]
[arXiv] Bottom-up Object Detection by Grouping Extreme and Center Points, [paper]
2018:
[arXiv] YOLOv3: An Incremental Improvement, [paper] [code]
2017:
[CVPR] YOLO9000: Better, Faster, Stronger, [paper] [code]
2016:
[CVPR] YOLO: You only look once: Unified, real-time object detection, [paper] [code]
[ECCV] SSD: Single Shot MultiBox Detector, [paper] [code]
[ECCV] LIFT: Learned Invariant Feature Transform, [paper]
2015:
[CVPR] MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching, [paper]
2014:
[ICLR] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, [paper] [code]
Dataset:
PASCAL VOC: The PASCAL Visual Object Classes (VOC) Challenge, [paper]
ILSVRC: ImageNet large scale visual recognition challenge, [paper]
Microsoft COCO: Common Objects in Context, is a large-scale object detection, segmentation, and captioning dataset, [paper]
Open Images: a collaborative release of ~9 million images annotated with labels spanning thousands of object categories, [paper]
1.3.2 2D instance segmentation:
2020:
[arXiv] Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection, [paper]
[arXiv] Weakly-Supervised Salient Object Detection via Scribble Annotations, [paper]
[arXiv] 1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation, [paper]
[arXiv] Deep Affinity Net: Instance Segmentation via Affinity, [paper]
[arXiv] PointINS: Point-based Instance Segmentation, [paper]
[arXiv] Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection, [paper]
[arXiv] Highly Efficient Salient Object Detection with 100K Parameters, [paper]
[arXiv] Conditional Convolutions for Instance Segmentation, [paper]
[arXiv] Global Context-Aware Progressive Aggregation Network for Salient Object Detection, [paper]
[arXiv] Fully Convolutional Networks for Automatically Generating Image Masks to Train Mask R-CNN, [paper]
[arXiv] Cross-layer Feature Pyramid Network for Salient Object Detection, [paper]
[arXiv] Towards Bounding-Box Free Panoptic Segmentation, [paper]
[arXiv] Self-Supervised Object-in-Gripper Segmentation from Robotic Motions, [paper]
[arXiv] Real-time Semantic Background Subtraction, [paper]
[arXiv] Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey, [paper]
[arXiv] FourierNet: Compact mask representation for instance segmentation using differentiable shape decoders, [paper]
[arXiv] Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data, [paper]
[arXiv] Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects, [paper]
[arXiv] Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter, [paper]
[arXiv] PointRend: Image Segmentation as Rendering, [paper]
[arXiv] Image Segmentation Using Deep Learning: A Survey, [paper]
2019:
[arXiv] CenterMask:Real-Time Anchor-Free Instance Segmentation, [paper] [code]
[arXiv] SAIS: Single-stage Anchor-free Instance Segmentation, [paper]
[arXiv] YOLACT++ Better Real-time Instance Segmentation, [paper] [code]
[ICCV] YOLACT: Real-time Instance Segmentation, [paper] [code]
[ICCV] TensorMask: A Foundation for Dense Object Segmentation, [paper] [code]
[CASE] Deep Workpiece Region Segmentation for Bin Picking, [paper]
2018:
[CVPR] PANet: Path Aggregation Network for Instance Segmentation, [paper] [code]
[CVPR] MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features, [paper]
2017:
[ICCV] Mask r-cnn, [paper] [code]
[IROS] SegICP: Integrated Deep Semantic Segmentation and Pose Estimation, [paper]
[CVPR] Fully Convolutional Instance-aware Semantic Segmentation, [paper]
2016:
[ECCV] SharpMask: Learning to Refine Object Segments, [paper] [code]
[BMVC]

本文综述了近年来基于视觉的机器人抓取技术,涵盖二维平面抓取和六自由度抓取方法,包括利用深度学习进行目标定位、姿态估计、抓取检测及运动规划的最新进展。
最低0.47元/天 解锁文章
1092





