STARC: A General Framework For Quantifying Differences Between Reward Functions


その結果、データから報酬関数を \emph{学習} しようとする \emph{報酬学習アルゴリズム} を使用することがますます一般的になってきています。
より優れた理論的保証を導き出すための障害の 1 つは、報酬関数間の差異を定量化するための優れた方法が存在しないことです。
この論文では、STARC (STAndardized Reward Comparison) メトリクスと呼ばれる、すべての報酬関数の空間上の疑似メトリクスのクラスの形式で、この問題の解決策を提供します。
STARC メトリクスは最悪の場合のリグレスの上限と下限の両方を誘発することを示します。これは、私たちのメトリクスがタイトであり、同じプロパティを持つメトリクスはすべて、私たちのメトリクスと同等のビリプシッツでなければならないことを意味します。
STARC メトリクスを使用すると、報酬学習アルゴリズムの理論的分析と実証的分析の両方をより簡単かつ原理的に行うことができます。


In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward function from data. However, the theoretical foundations of reward learning are not yet well-developed. In particular, it is typically not known when a given reward learning algorithm with high probability will learn a reward function that is safe to optimise. This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance. One of the roadblocks to deriving better theoretical guarantees is the lack of good methods for quantifying the difference between reward functions. In this paper we provide a solution to this problem, in the form of a class of pseudometrics on the space of all reward functions that we call STARC (STAndardised Reward Comparison) metrics. We show that STARC metrics induce both an upper and a lower bound on worst-case regret, which implies that our metrics are tight, and that any metric with the same properties must be bilipschitz equivalent to ours. Moreover, we also identify a number of issues with reward metrics proposed by earlier works. Finally, we evaluate our metrics empirically, to demonstrate their practical efficacy. STARC metrics can be used to make both theoretical and empirical analysis of reward learning algorithms both easier and more principled.


著者 Joar Skalse,Lucy Farnik,Sumeet Ramesh Motwani,Erik Jenner,Adam Gleave,Alessandro Abate
発行日 2024-03-11 16:29:17+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.AI, cs.LG パーマリンク