Grasping Deformable Objects via Reinforcement Learning with Cross-Modal Attention to Visuo-Tactile Inputs

要約

ロボットグリッパーを使用して、柔らかいシェルで変形可能なオブジェクトを把握する問題を考慮します。
そのようなオブジェクトには、動的に変化し、破裂しやすい壊れやすいマスの中心があります。
したがって、ロボットは、操作タスクを実行しながらオブジェクトをドロップまたは破損しないように、適切な制御入力を生成することが困難です。
マルチモーダルセンシングデータは、視覚データからのグローバルな情報（形状、ポーズなど）と、触覚データからの接触に関するローカル情報（たとえば、圧力）を通じてグローバル情報（形状、ポーズなど）を把握するのに役立ちます。
それらは一緒に使用するのに有益な補完的な情報を持っていますが、それらを融合することは、異なる特性のために困難です。
Visuo-Tactile Sensing情報から単純なグリッパーの制御入力を生成するディープ補強学習（DRL）に基づく方法を提案します。
私たちの方法は、エンコーダネットワークでクロスモーダルの注意モジュールを採用し、RLエージェントの損失関数を使用して自己監視方法でトレーニングします。
マルチモーダル融合により、提案された方法は、Visuo-Tactile感覚データからDRLエージェントの表現を学習できます。
実験結果は、目に見えないロボットの動きやオブジェクトなど、異なる環境で他の初期および遅いデータ融合方法を上回るのに効果的であることを示しています。

要約(オリジナル)

We consider the problem of grasping deformable objects with soft shells using a robotic gripper. Such objects have a center-of-mass that changes dynamically and are fragile so prone to burst. Thus, it is difficult for robots to generate appropriate control inputs not to drop or break the object while performing manipulation tasks. Multi-modal sensing data could help understand the grasping state through global information (e.g., shapes, pose) from visual data and local information around the contact (e.g., pressure) from tactile data. Although they have complementary information that can be beneficial to use together, fusing them is difficult owing to their different properties. We propose a method based on deep reinforcement learning (DRL) that generates control inputs of a simple gripper from visuo-tactile sensing information. Our method employs a cross-modal attention module in the encoder network and trains it in a self-supervised manner using the loss function of the RL agent. With the multi-modal fusion, the proposed method can learn the representation for the DRL agent from the visuo-tactile sensory data. The experimental result shows that cross-modal attention is effective to outperform other early and late data fusion methods across different environments including unseen robot motions and objects.

arxiv情報

著者	Yonghyun Lee,Sungeun Hong,Min-gu Kim,Gyeonghwan Kim,Changjoo Nam
発行日	2025-04-22 05:22:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Grasping Deformable Objects via Reinforcement Learning with Cross-Modal Attention to Visuo-Tactile Inputs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー