Diverse Offline Imitation via Fenchel Duality

要約

最近、教師なしスキル発見の分野で大きな進歩があり、さまざまな研究が内発的動機の源としての相互情報ベースの目標を提案しています。
これまでの研究は主に、環境へのオンラインアクセスを必要とするアルゴリズムの設計に焦点を当てていました。
対照的に、私たちは \textit{オフライン} スキル発見アルゴリズムを開発します。
私たちの問題定式化では、KL ダイバージェンスによって制約される相互情報量目標の最大化を考慮します。
より正確には、この制約により、良好な状態アクションカバレッジを備えたオフラインデータセットのサポート内で、各スキルの状態占有率がエキスパートの状態占有率に近くなることが保証されます。
私たちの主な貢献は、フェンケルの双対性、強化学習、教師なしスキル発見を結び付け、専門家と連携した多様なスキルを学習するためのシンプルなオフラインアルゴリズムを提供することです。

要約(オリジナル)

There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert.

arxiv情報

著者	Marin Vlastelica,Pavel Kolev,Jin Cheng,Georg Martius
発行日	2023-07-21 06:12:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diverse Offline Imitation via Fenchel Duality

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー