Home | Publications | XDQ+26

TARGO and TARGO-Net: Benchmarking Target-Driven Object Grasping Under Occlusions

MCML Authors

Daniel Cremers

Prof. Dr.

Director

Computer Vision & Artificial Intelligence

Abstract

Predicting 6-DoF grasp poses from a single RGB-D frame has recently achieved impressive accuracy, yet performance collapses when the target object is heavily occluded by clutter. In this paper, we establish the first benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO, and our model that remains robust under occlusion, TARGO-Net. Our main contributions are: 1) We first recognize the visual occlusion challenge in 6-DoF grasping using single RGB-D images, and found that even the current SOTA models suffer under high occlusion. 2) We propose TARGO dataset, which can be used to train and test 6-DoF grasp models under different visual occlusion severities, and evaluate model robustness in real-world scenarios. 3) We further devise TARGO-Net, a transformer-based grasping model involving a target completion module and target-scene cross-attention, that performs most robustly across all visual occlusion levels. 4) We discover that other than visual occlusion, number of occluders and target minimum dimension also contribute to grasp success.

article XDQ+26