An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

要約

機械学習におけるオフライン最大エントロピー正規化逆補強学習（オフラインMaxent-IRL）としても知られる動的離散選択（DDC）モデルを推定する問題を研究します。
目的は、オフラインの動作データからエージェントの動作を支配する報酬または$ q^*$関数を回復することです。
この論文では、線形パラメーター化された報酬の制限的な仮定なしに、これらの問題を解決するためのグローバルな収束勾配ベースの方法を提案します。
私たちのアプローチの斬新さは、ベルマン方程式の明示的な状態遷移確率推定の必要性を回避する、経験的リスク最小化（ERM）ベースのIRL/DDCフレームワークを導入することにあります。
さらに、私たちの方法は、ニューラルネットワークなどのノンパラメトリック推定技術と互換性があります。
したがって、提案された方法は、高次元の無限の状態空間にスケーリングされる可能性があります。
私たちのアプローチの根底にある重要な理論的洞察は、ベルマンの残差がpolyak-lojasiewicz（PL）条件を満たすことです。これは、強い凸性よりも弱いものの、速いグローバルな収束保証を確保するのに十分な特性です。
一連の合成実験を通じて、私たちのアプローチは、ベンチマーク方法と最先端の代替案よりも一貫して優れていることを実証します。

要約(オリジナル)

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning (offline MaxEnt-IRL) in machine learning. The objective is to recover reward or $Q^*$ functions that govern agent behavior from offline behavior data. In this paper, we propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards. The novelty of our approach lies in introducing the Empirical Risk Minimization (ERM) based IRL/DDC framework, which circumvents the need for explicit state transition probability estimation in the Bellman equation. Furthermore, our method is compatible with non-parametric estimation techniques such as neural networks. Therefore, the proposed method has the potential to be scaled to high-dimensional, infinite state spaces. A key theoretical insight underlying our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL) condition — a property that, while weaker than strong convexity, is sufficient to ensure fast global convergence guarantees. Through a series of synthetic experiments, we demonstrate that our approach consistently outperforms benchmark methods and state-of-the-art alternatives.

arxiv情報

著者	Enoch H. Kang,Hema Yoganarasimhan,Lalit Jain
発行日	2025-05-06 17:12:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー