Interactive Post-Training for Vision-Language-Action Models

要約

リップVLAを紹介します。これは、スパースバイナリの成功報酬のみを使用して、微調整された視覚障害（VLA）モデルを微調整するシンプルでスケーラブルな補強協定ベースのインタラクティブポストトレーニングパラダイムを紹介します。
既存のVLAトレーニングパイプラインは、オフラインの専門家のデモデータと監視された模倣に大きく依存しており、低データの体制下で新しいタスクや環境に適応する能力を制限しています。
RIPT-VLAは、ダイナミックロールアウトサンプリングと休暇1アウトアドバンテージの推定に基づいて、安定したポリシー最適化アルゴリズムでインタラクティブなトレーニングを可能にすることにより、これに対処します。
RIPT-VLAには次の特性があります。
まず、さまざまなVLAモデルに適用され、軽量クエストモデルが21.2％、7B OpenVLA-Offモデルが前例のない97.5％の成功率に改善されます。
第二に、それは計算上効率的でデータ効率が高くなっています。デモンストレーションは1つだけで、RIPT-VLAは、15回の反復以内に97％の成功率で実行不可能なSFTモデル（4％）を成功させることができます。
さらに、RIPT-VLAによって学んだポリシーは、さまざまなタスクやシナリオに一般化され、初期状態のコンテキストに堅牢であることを実証します。
これらの結果は、最小限の監督を通じて、トレーニング後のVLAモデルの実用的かつ効果的なパラダイムとしてRIPT-VLAを強調しています。

要約(オリジナル)

We introduce RIPT-VLA, a simple and scalable reinforcement-learning-based interactive post-training paradigm that fine-tunes pretrained Vision-Language-Action (VLA) models using only sparse binary success rewards. Existing VLA training pipelines rely heavily on offline expert demonstration data and supervised imitation, limiting their ability to adapt to new tasks and environments under low-data regimes. RIPT-VLA addresses this by enabling interactive post-training with a stable policy optimization algorithm based on dynamic rollout sampling and leave-one-out advantage estimation. RIPT-VLA has the following characteristics. First, it applies to various VLA models, resulting in an improvement on the lightweight QueST model by 21.2%, and the 7B OpenVLA-OFT model to an unprecedented 97.5% success rate. Second, it is computationally efficient and data-efficient: with only one demonstration, RIPT-VLA enables an unworkable SFT model (4%) to succeed with a 97% success rate within 15 iterations. Furthermore, we demonstrate that the policy learned by RIPT-VLA generalizes across different tasks and scenarios and is robust to the initial state context. These results highlight RIPT-VLA as a practical and effective paradigm for post-training VLA models through minimal supervision.

arxiv情報

著者	Shuhan Tan,Kairan Dou,Yue Zhao,Philipp Krähenbühl
発行日	2025-05-22 17:59:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Interactive Post-Training for Vision-Language-Action Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー