Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

要約

この論文では、線形二次ガウス (LQG) フレームワーク内で未知の部分的に観測可能なシステムを制御するための、単純探索として知られる計算効率の高い探索戦略によって生じるリグアントを分析します。
我々は、LQG-NAIVE と呼ばれる 2 フェーズ制御アルゴリズムを導入します。これには、システムモデルを取得するためにガウス入力信号を注入する初期フェーズと、それに続く、エピソード的な方法でのナイーブ探索と制御の間の相互作用の第 2 フェーズが含まれます。
LQG-NAIVE は $\tilde{\mathcal{O}}(\sqrt{T})$、つまり $\mathcal{O}(\sqrt{T})$ から対数までの後悔増加率を達成することを示します。
$T$ タイムステップ後の係数を計算し、数値シミュレーションを通じてそのパフォーマンスを検証します。
さらに、フィッシャー情報マトリックス (FIM) を組み込むことで探査信号を「閉ループ」設定に拡張する LQG-IF2E を提案します。
私たちは、LQG-NAIVE と比較した LQG-IF2E の競合パフォーマンスに関する説得力のある数値証拠を提供します。

要約(オリジナル)

In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop’ setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE.

arxiv情報

著者	Archith Athrey,Othmane Mazhar,Meichen Guo,Bart De Schutter,Shengling Shi
発行日	2023-11-24 14:25:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー