GRAPE: Generalizing Robot Policy via Preference Alignment

要約

さまざまなロボット工学タスクに関するビジョン・言語・アクション (VLA) モデルは最近進歩しているにもかかわらず、成功したロールアウトからのみ動作を複製することに依存しているため、目に見えないタスクへの汎用性が低いなどの重大な問題に悩まされています。
さらに、これらは通常、さまざまな設定で専門家によって収集されたデモンストレーションを再現するように微調整されているため、分布バイアスが生じ、効率、安全性、タスクの完了などのさまざまな操作目的への適応性が制限されます。
このギャップを埋めるために、GRAPE: 優先順位の調整によるロボットポリシーの一般化を導入します。
具体的には、GRAPE は VLA を軌道レベルで調整し、成功したトライアルと失敗したトライアルの両方からの報酬を暗黙的にモデル化して、多様なタスクへの汎用性を高めます。
さらに、GRAPE は複雑な操作タスクを独立した段階に分割し、大規模な視覚言語モデルによって提案されるキーポイントを使用してカスタマイズされた時空間制約を通じて嗜好モデリングを自動的にガイドします。
特に、これらの制約は柔軟であり、安全性、効率性、タスクの成功などのさまざまな目的に合わせてモデルを調整するためにカスタマイズできます。
私たちは、現実世界とシミュレートされた環境の両方で、さまざまなタスクにわたって GRAPE を評価します。
実験結果は、GRAPE が最先端の VLA モデルのパフォーマンスを向上させ、ドメイン内および目に見えない操作タスクの成功率をそれぞれ 51.79% および 60.36% 向上させることを示しています。
さらに、GRAPE は安全性や効率性などのさまざまな目的に合わせて調整することができ、衝突率を 44.31%、ロールアウトステップ長を 11.15% それぞれ削減します。
すべてのコード、モデル、データは https://grape-vla.github.io/ で入手できます。

要約(オリジナル)

Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 44.31% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/

arxiv情報

著者	Zijian Zhang,Kaiyuan Zheng,Zhaorun Chen,Joel Jang,Yi Li,Chaoqi Wang,Mingyu Ding,Dieter Fox,Huaxiu Yao
発行日	2024-11-28 18:30:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

GRAPE: Generalizing Robot Policy via Preference Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー