$\pi2\text{vec}$: Policy Representations with Successor Features

要約

この論文では、ブラックボックスポリシーの動作を特徴ベクトルとして表現する方法である $\pi2\text{vec}$ について説明します。
ポリシー表現は、タスクに依存しない方法でポリシーの動作に応じて基礎モデル機能の統計がどのように変化するかをキャプチャし、オフラインデータからトレーニングできるため、オフラインポリシーの選択に使用できます。
この研究は、オフライン RL に相当するオフラインポリシー評価、汎用的で強力な状態表現としての基礎モデル、リソースに制約のある環境における効率的なポリシー選択という 3 つの最新の研究分野を融合するためのレシピの重要な部分を提供します。

要約(オリジナル)

This paper describes $\pi2\text{vec}$, a method for representing behaviors of black box policies as feature vectors. The policy representations capture how the statistics of foundation model features change in response to the policy behavior in a task agnostic way, and can be trained from offline data, allowing them to be used in offline policy selection. This work provides a key piece of a recipe for fusing together three modern lines of research: Offline policy evaluation as a counterpart to offline RL, foundation models as generic and powerful state representations, and efficient policy selection in resource constrained environments.

arxiv情報

著者	Gianluca Scarpellini,Ksenia Konyushkova,Claudio Fantacci,Tom Le Paine,Yutian Chen,Misha Denil
発行日	2024-01-24 10:33:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

$\pi2\text{vec}$: Policy Representations with Successor Features

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー