Decision Transformer as a Foundation Model for Partially Observable Continuous Control

要約

部分状態観測可能な非線形力学系の閉ループ制御には、多様で標準化されているとは言い難い理論ツール群に関する専門知識が要求される。さらに、望ましいシステム動作を達成するためには、制御器と推定器の設計を繊細に統合する必要がある。一般的な制御器合成フレームワークを確立するために、我々はデシジョン・トランスフォーマー（DT）アーキテクチャを探求する。具体的には、まず制御タスクを、過去の観測、行動、報酬に基づいて現在の最適行動を予測することで、個別の推定器設計の必要性を排除する。次に、事前に訓練された言語モデル、すなわち生成的事前訓練変換器（GPT）シリーズを活用してDTを初期化し、その後、低ランク適応（LoRA）を用いて制御タスク用に訓練する。航空宇宙システムの操縦から偏微分方程式(PDE)の制御まで、5つの異なる制御タスクにわたる包括的な実験により、DTが制御タスクに内在するパラメータにとらわれない構造を捉える能力を持つことが実証された。DTは、全く新しいタスクに対して顕著なゼロショット汎化能力を示し、最小限の実証データで専門家のパフォーマンスレベルを急速に上回る。これらの結果は、一般的な制御アプリケーションの基礎コントローラとしてのDTの可能性を強調するものである。

要約(オリジナル)

Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained language models, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT’s capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications.

arxiv情報

著者	Xiangyuan Zhang,Weichao Mao,Haoran Qiu,Tamer Başar
発行日	2024-04-03 02:17:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Decision Transformer as a Foundation Model for Partially Observable Continuous Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー