SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

要約

多様で動的な環境で動作するロボット操作システムは、マルチタスク相互作用、目に見えないシナリオへの一般化、および空間メモリの3つの重要な能力を示す必要があります。
ロボットの操作では大きな進歩がありましたが、既存のアプローチは、複雑な環境変動とメモリ依存のタスクへの対処に一般化が不十分なことがよくあります。
このギャップを埋めるために、大規模な基礎モデルからの視覚的表現とマルチ解像度のアップサンプリングを活用するマルチビューロボット変圧器ベースのポリシーであるSam2actを紹介します。
SAM2ACTは、RLBenchベンチマークの18のタスクで86.8％の最先端の平均成功率を達成し、コロシュー州ベンチマークの堅牢な一般化を実証し、多様な環境摂動の下でパフォーマンスギャップは4.3％しかありません。
この基盤に基づいて、SAM2ACT+は、SAM2に触発されたメモリベースのアーキテクチャを提案します。SAM2には、メモリバンク、エンコーダー、および空間メモリを強化するための注意メカニズムが組み込まれています。
メモリ依存タスクを評価する必要性に対処するために、ロボット操作における空間メモリとアクションリコールを評価するために設計された新しいベンチマークであるメモリベンチを紹介します。
SAM2ACT+は、メモリベンチで競争力のあるパフォーマンスを達成し、既存のアプローチを大幅に上回り、メモリ対応ロボットシステムの境界を押し広げます。
プロジェクトページ：https：//sam2act.github.io/

要約(オリジナル)

Robotic manipulation systems operating in diverse, dynamic environments must exhibit three critical abilities: multitask interaction, generalization to unseen scenarios, and spatial memory. While significant progress has been made in robotic manipulation, existing approaches often fall short in generalization to complex environmental variations and addressing memory-dependent tasks. To bridge this gap, we introduce SAM2Act, a multi-view robotic transformer-based policy that leverages multi-resolution upsampling with visual representations from large-scale foundation model. SAM2Act achieves a state-of-the-art average success rate of 86.8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4.3% performance gap under diverse environmental perturbations. Building on this foundation, we propose SAM2Act+, a memory-based architecture inspired by SAM2, which incorporates a memory bank, an encoder, and an attention mechanism to enhance spatial memory. To address the need for evaluating memory-dependent tasks, we introduce MemoryBench, a novel benchmark designed to assess spatial memory and action recall in robotic manipulation. SAM2Act+ achieves competitive performance on MemoryBench, significantly outperforming existing approaches and pushing the boundaries of memory-enabled robotic systems. Project page: https://sam2act.github.io/

arxiv情報

著者	Haoquan Fang,Markus Grotz,Wilbert Pumacay,Yi Ru Wang,Dieter Fox,Ranjay Krishna,Jiafei Duan
発行日	2025-01-30 18:37:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー