Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning

要約

模倣学習は、ヒューマノイドロボットでジェネラリストの能力を可能にするための有望なアプローチですが、そのスケーリングは、高品質の専門家デモの希少性によって根本的に制約されています。
この制限は、最適ではないオープンエンドのプレイデータを活用することで軽減でき、多くの場合、収集しやすく、多様性を提供します。
この作業は、生成モデリングの最近の進歩、特にフローマッチング、拡散モデルに代わるものに基づいています。
フローマッチングのユニークな特性、つまり任意のソース分布の決定論的輸送とサポートを活用することにより、学習分布の極端を推定する方法を紹介します。
この方法を適用して、フローマッチングに基づいて、いくつかの目標条件付き模倣および強化学習アルゴリズムを開発します。ここでは、ポリシーは現在と目標の両方の観測に条件付けられます。
批評家、プランナー、俳優、世界モデルなどのコアコンポーネントをさまざまな方法で組み合わせることにより、さまざまなアーキテクチャ構成を調査して比較します。
OGBenchベンチマークでエージェントを評価し、データ収集中の異なるデモの動作が2D非摂食プッシュタスクのパフォーマンスにどのように影響するかを分析しました。
さらに、Talos Humanoidロボットに展開することにより、実際のハードウェアでのアプローチを検証し、高次元の画像観測に基づいて複雑な操作タスクを実行し、現実的なキッチン環境での一連のピックアンドプレイスと明確なオブジェクト操作を特徴としています。
実験的なビデオとコードは、https：//hucebot.github.io/extremum_flow_matching_website/で入手できます。

要約(オリジナル)

Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the extremum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and compare different architectural configurations by combining core components, such as critic, planner, actor, or world model, in various ways. We evaluated our agents on the OGBench benchmark and analyzed how different demonstration behaviors during data collection affect performance in a 2D non-prehensile pushing task. Furthermore, we validated our approach on real hardware by deploying it on the Talos humanoid robot to perform complex manipulation tasks based on high-dimensional image observations, featuring a sequence of pick-and-place and articulated object manipulation in a realistic kitchen environment. Experimental videos and code are available at: https://hucebot.github.io/extremum_flow_matching_website/

arxiv情報

著者	Quentin Rouxel,Clemente Donoso,Fei Chen,Serena Ivaldi,Jean-Baptiste Mouret
発行日	2025-05-26 09:06:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー