Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer

要約

特に果物などの変形可能な物体を確実にロボットで掴むことは、グリッパーとの接触相互作用が不十分であったり、未知の物体のダイナミクスや形状が原因で依然として困難な課題です。
この研究では、物体を安全に把握するために触覚情報と視覚情報を活用する剛性グリッパー用の Transformer ベースのロボット把握フレームワークを提案します。
具体的には、Transformer モデルは、事前に定義された 2 つの探索アクション (つまむこととスライドすること) を実行することでセンサーフィードバックで物理的特徴の埋め込みを学習し、特定の把握強度で多層パーセプトロン (MLP) を通じて把握結果を予測します。
これらの予測を使用して、グリッパーは推論によって安全な把握強度を予測します。
畳み込みリカレントネットワークと比較して、Transformer モデルは画像シーケンス全体にわたる長期的な依存関係をキャプチャし、時空間特徴を同時に処理できます。
まず、滑り検出のために公開データセットで Transformer モデルのベンチマークを実行します。
続いて、Transformer モデルが把握精度と計算効率の点で CNN+LSTM モデルよりも優れていることを示します。
また、新しい果物把握データセットを収集し、提案されたフレームワークを使用して、目に見える果物と目に見えない果物の両方についてオンライン把握実験を実施します。
私たちのコードとデータセットは GitHub で公開されています。

要約(オリジナル)

Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a Transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the Transformer models learn physical feature embeddings with sensor feedback through performing two pre-defined explorative actions (pinching and sliding) and predict a grasping outcome through a multilayer perceptron (MLP) with a given grasping strength. Using these predictions, the gripper predicts a safe grasping strength via inference. Compared with convolutional recurrent networks, the Transformer models can capture the long-term dependencies across the image sequences and process spatial-temporal features simultaneously. We first benchmark the Transformer models on a public dataset for slip detection. Following that, we show that the Transformer models outperform a CNN+LSTM model in terms of grasping accuracy and computational efficiency. We also collect a new fruit grasping dataset and conduct online grasping experiments using the proposed framework for both seen and unseen fruits. Our codes and dataset are public on GitHub.

arxiv情報

著者	Yunhai Han,Kelin Yu,Rahul Batra,Nathan Boyd,Tuo Zhao,Yu She,Seth Hutchinson,Ye Zhao
発行日	2023-05-17 02:54:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー