Modality Selection and Skill Segmentation via Cross-Modality Attention

要約

触覚やオーディオなどの追加の感覚モダリティを基礎ロボットモデルに組み込むことは、次元の呪いのために大きな課題をもたらします。
この作業は、モダリティ選択を通じてこの問題に対処します。
各タイムステップでのアクション生成に対して最も有益なモダリティを特定し、選択的に利用するためのクロスモダリティの注意（CMA）メカニズムを提案します。
さらに、CMAの適用を専門家のデモンストレーションからのセグメントプリミティブスキルに拡張し、このセグメンテーションを活用して、長老の豊富な操作タスクを解決できる階層ポリシーを訓練します。

要約(オリジナル)

Incorporating additional sensory modalities such as tactile and audio into foundational robotic models poses significant challenges due to the curse of dimensionality. This work addresses this issue through modality selection. We propose a cross-modality attention (CMA) mechanism to identify and selectively utilize the modalities that are most informative for action generation at each timestep. Furthermore, we extend the application of CMA to segment primitive skills from expert demonstrations and leverage this segmentation to train a hierarchical policy capable of solving long-horizon, contact-rich manipulation tasks.

arxiv情報

著者	Jiawei Jiang,Kei Ota,Devesh K. Jha,Asako Kanezaki
発行日	2025-04-20 11:32:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Modality Selection and Skill Segmentation via Cross-Modality Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー