Latent Action Learning Requires Supervision in the Presence of Distractors

要約

最近、Latent Action Policies（LAPO）によって開拓されたLatent Action Learningは、観察のみのデータで顕著なトレーニング効率を示しており、具体化されたAIのためにWebで利用可能な膨大な量のビデオを活用する可能性を提供します。
ただし、以前の作業では、ディストラクタフリーのデータに焦点を当てており、観測間の変化は主に根本的な行動によって説明されます。
残念ながら、実際のビデオには、潜在的なアクション学習を妨げる可能性のあるアクション相関のディストラクタが含まれています。
気を散らすコントロールスイート（DCS）を使用して、潜在的なアクション学習に対するディストラクタの効果を経験的に調査し、そのようなシナリオでLAPOが苦労していることを実証します。
Laomを提案します。Laomは、線形プロービングで測定されるように、潜在アクションの品質を8倍改善する単純なLAPO変更を提案します。
重要なことに、潜在的なアクション学習中に、完全なデータセットのわずか2.5％で、地上の真実のアクションで監督を提供すると、平均して下流のパフォーマンスが4.2倍向上することを示しています。
我々の調査結果は、潜在アクションモデル（LAM）トレーニング中に監督を統合することが、ディストラクタの存在下で重要であり、最初の学習LAMの従来のパイプラインに挑戦し、その後潜在的な真実のアクションからグラウンドトゥルースアクションへと解読することを示唆しています。

要約(オリジナル)

Recently, latent action learning, pioneered by Latent Action Policies (LAPO), have shown remarkable pre-training efficiency on observation-only data, offering potential for leveraging vast amounts of video available on the web for embodied AI. However, prior work has focused on distractor-free data, where changes between observations are primarily explained by ground-truth actions. Unfortunately, real-world videos contain action-correlated distractors that may hinder latent action learning. Using Distracting Control Suite (DCS) we empirically investigate the effect of distractors on latent action learning and demonstrate that LAPO struggle in such scenario. We propose LAOM, a simple LAPO modification that improves the quality of latent actions by 8x, as measured by linear probing. Importantly, we show that providing supervision with ground-truth actions, as few as 2.5% of the full dataset, during latent action learning improves downstream performance by 4.2x on average. Our findings suggest that integrating supervision during Latent Action Models (LAM) training is critical in the presence of distractors, challenging the conventional pipeline of first learning LAM and only then decoding from latent to ground-truth actions.

arxiv情報

著者	Alexander Nikulin,Ilya Zisman,Denis Tarasov,Nikita Lyubaykin,Andrei Polubarov,Igor Kiselev,Vladislav Kurenkov
発行日	2025-06-12 16:28:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Latent Action Learning Requires Supervision in the Presence of Distractors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー