xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

要約

さまざまなドメインから事前に収集されたデータの再利用は、ターゲットドメインにはデータが不十分だが、他の関連ドメインには比較的豊富なデータがある意思決定タスクにとって魅力的なソリューションです。
既存のクロスドメインポリシー転送方法は主に、ドメイン/タスク固有の識別子、表現、またはポリシーの学習など、ポリシー学習を促進するためにドメインの対応または修正を学習することを目的としています。
この設計哲学では、多くの場合、柔軟性に欠ける重いモデルアーキテクチャまたはタスク/ドメイン固有のモデリングが発生します。
この現実は私たちに疑問を抱かせます。複雑なダウンストリームのクロスドメインポリシー転送モデルに依存するのではなく、データレベルで普遍的にドメインギャップを直接埋めることができるだろうか?
この研究では、クロスドメイン軌道適応のために特別に設計された拡散モデルを採用するクロスドメイン軌道編集 (xTED) フレームワークを提案します。
私たちが提案するモデルアーキテクチャは、状態、アクション、報酬の間の複雑な依存関係と、ターゲットデータ内のダイナミクスパターンを効果的に捕捉します。
事前にトレーニングされた拡散を事前として利用することにより、元の意味情報を保持しながら、ソースドメインの軌跡をターゲットドメインのプロパティと一致するように変換できます。
このプロセスは、根底にあるドメインギャップを暗黙的に修正し、ソースデータの状態の現実性とダイナミクスの信頼性を高め、さまざまな下流のポリシー学習方法との柔軟な組み込みを可能にします。
そのシンプルさにも関わらず、xTED は広範なシミュレーションや実際のロボット実験で優れたパフォーマンスを実証します。

要約(オリジナル)

Reusing pre-collected data from different domains is an appealing solution for decision-making tasks that have insufficient data in the target domain but are relatively abundant in other related domains. Existing cross-domain policy transfer methods mostly aim at learning domain correspondences or corrections to facilitate policy learning, such as learning domain/task-specific discriminators, representations, or policies. This design philosophy often results in heavy model architectures or task/domain-specific modeling, lacking flexibility. This reality makes us wonder: can we directly bridge the domain gaps universally at the data level, instead of relying on complex downstream cross-domain policy transfer models? In this study, we propose the Cross-Domain Trajectory EDiting (xTED) framework that employs a specially designed diffusion model for cross-domain trajectory adaptation. Our proposed model architecture effectively captures the intricate dependencies among states, actions, and rewards, as well as the dynamics patterns within target data. By utilizing the pre-trained diffusion as a prior, source domain trajectories can be transformed to match with target domain properties while preserving original semantic information. This process implicitly corrects underlying domain gaps, enhancing state realism and dynamics reliability in the source data, and allowing flexible incorporation with various downstream policy learning methods. Despite its simplicity, xTED demonstrates superior performance in extensive simulation and real-robot experiments.

arxiv情報

著者	Haoyi Niu,Qimao Chen,Tenglong Liu,Jianxiong Li,Guyue Zhou,Yi Zhang,Jianming Hu,Xianyuan Zhan
発行日	2024-10-11 17:15:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー