Universal Approximation Theorem for Deep Q-Learning via FBSDE System

要約

ディープQネットワーク（DQNS）の近似能力は、一般的に、ベルマン方程式の解である最適なQ機能の固有の構造特性を活用しない一般的なユニバーサル近似定理（UAT）によって正当化されます。
このペーパーでは、Bellmanの更新に固有の反復精製プロセスをエミュレートするように設計されたアーキテクチャが設計されたDQNSのクラスのUATを確立します。
私たちの分析の中心的な要素は規則性の伝播です。単一のベルマンオペレーターアプリケーションによって誘導される変換は、後方の確率的微分方程式（BSDES）理論が分析ツールを提供する規則性を示します。
動的プログラミングの原則。
機能空間に作用する神経演算子として考案された深い残留ネットワークの層が、ベルマン演算子の作用を近似できることを実証します。
したがって、結果の近似定理は、コントロール問題の構造に本質的にリンクされており、制御された誤差伝播を伴う値関数の改良の反復にネットワークの深さが直接対応する証拠手法を提供します。
この視点は、価値関数の空間でのネットワークの操作の動的なシステムビューを明らかにしています。

要約(オリジナル)

The approximation capabilities of Deep Q-Networks (DQNs) are commonly justified by general Universal Approximation Theorems (UATs) that do not leverage the intrinsic structural properties of the optimal Q-function, the solution to a Bellman equation. This paper establishes a UAT for a class of DQNs whose architecture is designed to emulate the iterative refinement process inherent in Bellman updates. A central element of our analysis is the propagation of regularity: while the transformation induced by a single Bellman operator application exhibits regularity, for which Backward Stochastic Differential Equations (BSDEs) theory provides analytical tools, the uniform regularity of the entire sequence of value iteration iterates–specifically, their uniform Lipschitz continuity on compact domains under standard Lipschitz assumptions on the problem data–is derived from finite-horizon dynamic programming principles. We demonstrate that layers of a deep residual network, conceived as neural operators acting on function spaces, can approximate the action of the Bellman operator. The resulting approximation theorem is thus intrinsically linked to the control problem’s structure, offering a proof technique wherein network depth directly corresponds to iterations of value function refinement, accompanied by controlled error propagation. This perspective reveals a dynamic systems view of the network’s operation on a space of value functions.

arxiv情報

著者	Qian Qi
発行日	2025-05-09 13:11:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Universal Approximation Theorem for Deep Q-Learning via FBSDE System

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー