Training Strategies for Efficient Embodied Reasoning

要約

ロボットチェーンオブテーブ推論（COT） – モデルがアクションを選択する前に有用な中間表現を予測する – ロボットポリシー、特にビジョン言語アクションモデル（VLA）の一般化とパフォーマンスを改善するための効果的な方法を提供します。
このようなアプローチはパフォーマンスと一般化を改善することが示されていますが、専門のロボット推論データや推論の速度が遅いなど、コアの制限に苦しんでいます。
これらの問題に対処する新しいロボット推論アプローチを設計するために、推論がポリシーのパフォーマンスが重要である理由のより完全な特性評価が重要です。
ロボットの推論がポリシーを改善するいくつかのメカニズムを仮定します – （1）より良い表現学習、（2）学習カリキュラム化を改善し、（3）表現率を高め、それぞれを隔離してテストするためのロボットCOTの推論の単純なバリエーションを考案します。
推論を生成することを学ぶことは、より良いVLA表現につながることを発見し、推論に注意を払うことで、これらの機能を実際に活用してアクション予測を改善するのに役立ちます。
私たちの結果は、COTの推論がVLAに役立つ理由をよりよく理解することができます。これは、ロボットの推論のために2つのシンプルで軽量な代替レシピを導入するために使用します。
提案されたアプローチは、標準的なロボット推論と比較して、リベロ90ベンチマークでの最先端の結果、および3倍の推論スピードアップに比べて、非合理的なポリシー、最先端の結果に比べて大幅なパフォーマンスの向上を達成しています。

要約(オリジナル)

Robot chain-of-thought reasoning (CoT) — wherein a model predicts helpful intermediate representations before choosing actions — provides an effective method for improving the generalization and performance of robot policies, especially vision-language-action models (VLAs). While such approaches have been shown to improve performance and generalization, they suffer from core limitations, like needing specialized robot reasoning data and slow inference speeds. To design new robot reasoning approaches that address these issues, a more complete characterization of why reasoning helps policy performance is critical. We hypothesize several mechanisms by which robot reasoning improves policies — (1) better representation learning, (2) improved learning curricularization, and (3) increased expressivity — then devise simple variants of robot CoT reasoning to isolate and test each one. We find that learning to generate reasonings does lead to better VLA representations, while attending to the reasonings aids in actually leveraging these features for improved action prediction. Our results provide us with a better understanding of why CoT reasoning helps VLAs, which we use to introduce two simple and lightweight alternative recipes for robot reasoning. Our proposed approaches achieve significant performance gains over non-reasoning policies, state-of-the-art results on the LIBERO-90 benchmark, and a 3x inference speedup compared to standard robot reasoning.

arxiv情報

著者	William Chen,Suneel Belkhale,Suvir Mirchandani,Oier Mees,Danny Driess,Karl Pertsch,Sergey Levine
発行日	2025-05-13 05:35:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Training Strategies for Efficient Embodied Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー