Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

要約

この論文では、強化学習 (RL) における Q 関数損失の関数近似器としての混合ガウスモデル (GMM) の新しい役割を確立します。
GMM が確率密度関数の推定値として典型的な役割を果たす既存の RL 文献とは異なり、ここでは GMM は Q 関数の損失を近似します。
新しい Q 関数近似器である造語 GMM-QF は、標準的なポリシー反復スキームにおける新しいポリシー評価ステップとしてリーマン最適化タスクを促進するために、ベルマン残差に組み込まれています。
この論文は、ガウスカーネルのハイパーパラメータ (平均および共分散行列) がデータからどのように学習されるかを示し、リーマン最適化の強力なツールボックスへの RL の扉を開きます。
数値テストの結果、トレーニングデータを使用しない場合、提案された設計は、ベンチマーク RL タスクにおいて、トレーニングデータを使用するディープ Q ネットワークを含む最先端の手法よりも優れたパフォーマンスを発揮することが示されています。

要約(オリジナル)

This paper establishes a novel role for Gaussian-mixture models (GMMs) as functional approximators of Q-function losses in reinforcement learning (RL). Unlike the existing RL literature, where GMMs play their typical role as estimates of probability density functions, GMMs approximate here Q-function losses. The new Q-function approximators, coined GMM-QFs, are incorporated in Bellman residuals to promote a Riemannian-optimization task as a novel policy-evaluation step in standard policy-iteration schemes. The paper demonstrates how the hyperparameters (means and covariance matrices) of the Gaussian kernels are learned from the data, opening thus the door of RL to the powerful toolbox of Riemannian optimization. Numerical tests show that with no use of training data, the proposed design outperforms state-of-the-art methods, even deep Q-networks which use training data, on benchmark RL tasks.

arxiv情報

著者	Minh Vu,Konstantinos Slavakis
発行日	2024-09-06 16:13:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gaussian-Mixture-Model Q-Functions for Reinforcement Learning by Riemannian Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー