Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

要約

エージェンシーの問題は、今日の大規模な機械学習タスクで発生しており、学習者がコンテンツ作成を指示したり、データ収集を強制したりすることができません。
この研究では、契約設計を通じてオンライン学習問題におけるさまざまな利害関係者の経済的利益を調整するための理論的枠組みを提案します。
\emph{契約強化学習}と呼ばれるこの問題は、マルコフ意思決定プロセスの古典的なモデルから当然発生します。学習プリンシパルは、以下の実現を条件とする一連の支払いルールを通じて、共通の利益のためにエージェントの行動方針に最適な影響を与えようとします。
次の状態。
計画問題については、先見の明のあるエージェントに対して最適な契約を決定するための効率的な動的プログラミングアルゴリズムを設計します。
学習問題については、後悔のない学習アルゴリズムの一般的な設計を導入して、契約の堅牢な設計から探索と活用のバランスまでの課題を解きほぐし、効率的な検索アルゴリズムの構築までの複雑さの分析を軽減します。
いくつかの自然なクラスの問題に対して、$\tilde{O}(\sqrt{T})$ 後悔を証明できるようにカスタマイズされた検索アルゴリズムを設計します。
また、マイルドな技術的仮定を使用してオンライン契約設計における既存の分析を改善する、一般的な問題に対する $\tilde{O}(T^{2/3})$ を使用したアルゴリズムも提示します。

要約(オリジナル)

The agency problem emerges in today’s large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent’s action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation, reducing the complexity analysis to the construction of efficient search algorithms. For several natural classes of problems, we design tailored search algorithms that provably achieve $\tilde{O}(\sqrt{T})$ regret. We also present an algorithm with $\tilde{O}(T^{2/3})$ for the general problem that improves the existing analysis in online contract design with mild technical assumptions.

arxiv情報

著者	Jibang Wu,Siyu Chen,Mengdi Wang,Huazheng Wang,Haifeng Xu
発行日	2024-07-02 15:17:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー