Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

要約

この論文では、ロボット操作問題における逐次意思決定のためのヒューリスティックとしてディープニューラルアーキテクチャ (つまり、ビジョントランスフォーマーと ResNet) を使用することを提案します。
この定式化により、タスクの完了に関連するオブジェクトのサブセットを予測できます。
このような問題は、記号推論と連続動作計画を組み合わせたタスクおよび動作計画 (TAMP) 定式化によって対処されることがよくあります。
本質的に、アクションとオブジェクトの関係は、操作動作を解決するために使用される個別の象徴的な決定のために解決されます (たとえば、非線形軌道の最適化を介して)。
ただし、長期にわたるタスクを解決するには、考えられるすべてのアクションとオブジェクトの組み合わせを考慮する必要があり、TAMP アプローチのスケーラビリティが制限されます。
この組み合わせの複雑さを克服するために、TAMP ソルバーと統合された視覚認識モジュールを導入します。
タスクとシーンの初期画像が与えられると、学習されたモデルはタスクを達成するためのオブジェクトの関連性を出力します。
モデルの予測をヒューリスティックとして TAMP 定式化に組み込むことにより、検索スペースのサイズが大幅に削減されます。
結果は、最先端の TAMP ソルバーと比較して、私たちのフレームワークがより効率的に実現可能な解決策を見つけることを示しています。

要約(オリジナル)

In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image of the scene, the learned model outputs the relevancy of objects to accomplish the task. By incorporating the predictions of the model into a TAMP formulation as a heuristic, the size of the search space is significantly reduced. Results show that our framework finds feasible solutions more efficiently when compared to a state-of-the-art TAMP solver.

arxiv情報

著者	Hongyou Zhou,Ingmar Schubert,Marc Toussaint,Ozgur S. Oguz
発行日	2023-08-01 09:28:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー