Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

要約

複数のドメインから学習することは、単一の統一ロボットシステムの一般化に影響を与える主要な要因です。
このホワイトペーパーでは、幅広いドメイン外データを使用してパフォーマンスと一般化能力を向上させることにより、軌道予測モデルを学ぶことを目指しています。
軌跡モデルは、指示を与えられた現在のフレームの任意のポイントの軌跡を予測するように設計されており、ロボット政策学習のための詳細な制御ガイダンスを提供できます。
多様なドメイン外データ分布を処理するために、\ textBf {tra-moe}として造られた軌跡モデルのまばらなGOE（\ textbf {top-1}ゲーティング戦略）アーキテクチャを提案します。
まばらなアクティベーション設計により、パラメーターの協力と専門化のバランスが良好になり、トークンあたりの一定のフロップを維持しながら、大規模なドメイン外データから効果的に恩恵を受けます。
さらに、予測された軌跡の2Dマスク表現を学習することにより、適応ポリシーコンディショニング手法をさらに導入します。これは、アクション予測をより柔軟に導くために画像観測と明示的に整合しています。
シミュレーションと現実世界の両方のシナリオで広範な実験を実行して、TRA-MOEと適応型ポリシーコンディショニング手法の有効性を検証します。
また、TRA-MOEを訓練するために包括的な経験的研究を実施し、TRA-MOEがTRA-MOEのパラメーターカウントと一致するように拡張されている場合でも、密なベースラインモデルと比較して優れた性能を示すことを実証します。

要約(オリジナル)

Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE’s parameter count.

arxiv情報

著者	Jiange Yang,Haoyi Zhu,Yating Wang,Gangshan Wu,Tong He,Limin Wang
発行日	2025-04-01 17:59:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー