End-to-End Learning of Deep Visuomotor Policy for Needle Picking

要約

針のピッキングは、針の小さく細長い形状、針の形状やサイズのばらつき、およびミリメートルレベルの制御の要求のため、ロボット支援手術における困難な操作作業です。
これまでの作品は、針の事前データ (幾何学的モデルなど) に大きく依存しており、目に見えない針のバリエーションに対応するのが困難でした。
この論文では、針摘みのための深層視覚運動ポリシーをトレーニングするための最初のエンドツーエンド学習方法を紹介します。
具体的には、最先端のモデルベース強化学習手法 DreamerV2 の学習効率を向上させるためのデモを最大限に活用する DreamerfD を提案します。
DreamerV2 の変分自動エンコーダ (VAE) は高解像度画像にスケールすることが難しいため、低解像度画像空間で制御関連の視覚信号を表現する動的スポットライト適応を提案します。
仮想クラッチは、ロールアウト開始時の前後のエンコード状態間の重大な誤差によるパフォーマンスの低下を軽減するためにも提案されています。
私たちは、シミュレーションで広範な実験を実施し、パフォーマンス、ロバスト性、ドメイン内変動の適応、およびメソッドの個々のコンポーネントの有効性を評価しました。
8,000 のデモンストレーションタイムステップと 140,000 のオンラインポリシータイムステップでトレーニングされた私たちのメソッドは、80% という驚異的な成功率を達成できます。
さらに、私たちの方法は、針の変動や画像の乱れなどの目に見えない領域内変動に対する一般化における優位性を効果的に実証し、その堅牢性と多用途性を強調しました。
コードとビデオは https://sites.google.com/view/DreamerfD で入手できます。

要約(オリジナル)

Needle picking is a challenging manipulation task in robot-assisted surgery due to the characteristics of small slender shapes of needles, needles’ variations in shapes and sizes, and demands for millimeter-level control. Prior works, heavily relying on the prior of needles (e.g., geometric models), are hard to scale to unseen needles’ variations. In this paper, we present the first end-to-end learning method to train deep visuomotor policy for needle picking. Concretely, we propose DreamerfD to maximally leverage demonstrations to improve the learning efficiency of a state-of-the-art model-based reinforcement learning method, DreamerV2; Since Variational Auto-Encoder (VAE) in DreamerV2 is difficult to scale to high-resolution images, we propose Dynamic Spotlight Adaptation to represent control-related visual signals in a low-resolution image space; Virtual Clutch is also proposed to reduce performance degradation due to significant error between prior and posterior encoded states at the beginning of a rollout. We conducted extensive experiments in simulation to evaluate the performance, robustness, in-domain variation adaptation, and effectiveness of individual components of our method. Our method, trained by 8k demonstration timesteps and 140k online policy timesteps, can achieve a remarkable success rate of 80%. Furthermore, our method effectively demonstrated its superiority in generalization to unseen in-domain variations including needle variations and image disturbance, highlighting its robustness and versatility. Codes and videos are available at https://sites.google.com/view/DreamerfD.

arxiv情報

著者	Hongbin Lin,Bin Li,Xiangyu Chu,Qi Dou,Yunhui Liu,Kwok Wai Samuel Au
発行日	2023-07-26 09:14:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

End-to-End Learning of Deep Visuomotor Policy for Needle Picking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー