A Modern Take on Visual Relationship Reasoning for Grasp Planning

要約

現実世界の乱雑なシーンと対話することは、最適なピックシーケンスや効率的なオブジェクト取得戦略を決定するために、観察されたオブジェクト間の複雑な空間依存関係を理解する必要があるロボットエージェントにいくつかの課題をもたらします。
既存のソリューションは通常、単純化されたシナリオを管理し、最初のオブジェクト検出フェーズに続いてペアごとのオブジェクトの関係を予測することに重点を置いていますが、グローバルコンテキストを見落としたり、冗長で欠落したオブジェクトの関係の処理に苦労したりすることがよくあります。
この研究では、把握計画のための視覚的な関係推論の現代的な解釈を示します。
97 の異なるカテゴリから最大 35 個のオブジェクトを含むビンピッキングシーンを含む新しいテストベッドである D3GD を紹介します。
さらに、オブジェクトの検出とそれらの空間関係を表す隣接行列の生成を同時に行う、新しいエンドツーエンドのトランスフォーマーベースの依存関係グラフ生成モデルである D3G を提案します。
標準的な指標の限界を認識し、モデルのパフォーマンスを評価するために関係性の平均精度を初めて採用し、広範な実験ベンチマークを実施しました。
得られた結果は、この課題に対する新しい最先端技術としての私たちのアプローチを確立し、ロボット操作における将来の研究の基礎を築きます。
コードとデータセットは https://paolotron.github.io/d3g.github.io で公開しています。

要約(オリジナル)

Interacting with real-world cluttered scenes pose several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient object retrieval strategies. Existing solutions typically manage simplified scenarios and focus on predicting pairwise object relationships following an initial object detection phase, but often overlook the global context or struggle with handling redundant and missing object relations. In this work, we present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. Additionally, we propose D3G, a new end-to-end transformer-based dependency graph generation model that simultaneously detects objects and produces an adjacency matrix representing their spatial relationships. Recognizing the limitations of standard metrics, we employ the Average Precision of Relationships for the first time to evaluate model performance, conducting an extensive experimental benchmark. The obtained results establish our approach as the new state-of-the-art for this task, laying the foundation for future research in robotic manipulation. We publicly release the code and dataset at https://paolotron.github.io/d3g.github.io.

arxiv情報

著者	Paolo Rabino,Tatiana Tommasi
発行日	2024-12-20 16:19:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Modern Take on Visual Relationship Reasoning for Grasp Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー