Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

要約

マルチモーダルなタスク仕様は、ロボットのパフォーマンスを向上させるために不可欠であり、 \textit{クロスモダリティアライメント} により、ロボットは複雑なタスクの指示を総合的に理解できるようになります。
モデルのトレーニングのためにマルチモーダル命令に直接注釈を付けることは、ペアになったマルチモーダルデータがまばらであるため、非現実的であることがわかります。
この研究では、実際のデータに豊富なユニモーダル命令を活用することで、ロボットにマルチモーダルなタスク仕様を効果的に学習させることができることを実証します。
まず、広範なドメイン外データを使用してロボットマルチモーダルエンコーダーを事前トレーニングすることで、ロボットに強力な \textit{クロスモダリティアライメント} 機能を与えます。
次に、2 つの Collapse 操作と Corrupt 操作を使用して、学習されたマルチモーダル表現に残っているモダリティギャップをさらに橋渡しします。
このアプローチは、同一のタスク目標のさまざまなモダリティを交換可能な表現として投影するため、適切に調整されたマルチモーダルな潜在空間内での正確なロボット操作が可能になります。
130を超えるタスクにわたる評価と、シミュレートされたLIBEROベンチマークと実際のロボットプラットフォームの両方での4000件の評価は、提案したフレームワークの優れた機能を示し、ロボット学習におけるデータ制約を克服する際の大きな利点を示しています。
ウェブサイト: zh1hao.wang/Robo_MUTUAL

要約(オリジナル)

Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions. Directly annotating multimodal instructions for model training proves impractical, due to the sparsity of paired multimodal data. In this study, we demonstrate that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications. First, we endow the robot with strong \textit{Cross-modality Alignment} capabilities, by pretraining a robotic multimodal encoder using extensive out-of-domain data. Then, we employ two Collapse and Corrupt operations to further bridge the remaining modality gap in the learned multimodal representation. This approach projects different modalities of identical task goal as interchangeable representations, thus enabling accurate robotic operations within a well-aligned multimodal latent space. Evaluation across more than 130 tasks and 4000 evaluations on both simulated LIBERO benchmark and real robot platforms showcases the superior capabilities of our proposed framework, demonstrating significant advantage in overcoming data constraints in robotic learning. Website: zh1hao.wang/Robo_MUTUAL

arxiv情報

著者	Jianxiong Li,Zhihao Wang,Jinliang Zheng,Xiaoai Zhou,Guanming Wang,Guanglu Song,Yu Liu,Jingjing Liu,Ya-Qin Zhang,Junzhi Yu,Xianyuan Zhan
発行日	2024-10-02 13:23:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー