Quantum Policy Gradient in Reproducing Kernel Hilbert Space

要約

パラメータ化された量子回路は、機械学習のための表現力豊かでデータ効率の高い表現を提供します。
量子状態は高次元のヒルベルト空間に存在するため、パラメータ化された量子回路はカーネル法に関して自然な解釈を持ちます。
量子カーネルによる量子回路の表現は、量子教師あり学習において広く研究されてきましたが、量子強化学習の文脈では無視されてきました。
この論文では、量子環境における量子カーネルポリシーを使用したパラメトリックおよびノンパラメトリックポリシー勾配アルゴリズムとアクタークリティカルアルゴリズムを提案します。
このアプローチは、数値的および分析的な量子ポリシー勾配技術の両方を使用して実装されており、ポリシーの勾配の利用可能な分析形式や調整可能な表現力など、カーネル法の多くの利点を活用することができます。
提案されたアプローチはベクトル値のアクション空間に適しており、各定式化は古典的な対応物と比較してクエリの複雑さが二次的に減少することを示しています。
2 つのアクタークリティカルアルゴリズム (1 つは確率論的ポリシー勾配に基づくもの、もう 1 つは決定論的ポリシー勾配 (一般的な DDPG アルゴリズムに相当) に基づくもの) は、有利な条件下で量子ポリシー勾配アルゴリズムと比較してクエリの複雑さがさらに削減されることを示しています。

要約(オリジナル)

Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum reinforcement learning. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.

arxiv情報

著者	David M. Bossens,Kishor Bharti,Jayne Thompson
発行日	2024-11-21 18:09:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quantum Policy Gradient in Reproducing Kernel Hilbert Space

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー