EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation

要約

人間のフィードバックからの強化学習 (RLHF) を通じてモバイルロボットのナビゲーションポリシーをユーザーの好みに合わせるには、信頼性が高く、行動が多様なユーザークエリが必要です。
ただし、決定論的ポリシーは、特定のナビゲーションタスクに対してさまざまなナビゲーション軌道の提案を生成できません。
このペーパーでは、正則化項を通じて動作の多様性を実現するポリシーのアンサンブルを使用したクエリ生成アプローチである EnQuery を紹介します。
特定のナビゲーションタスクに対して、EnQuery は複数のナビゲーション軌跡の提案を生成するため、より少ないクエリで嗜好データ収集の効率が最適化されます。
私たちの方法論は、クエリの少ない状況でナビゲーションポリシーをユーザーの好みに合わせて調整する際に優れたパフォーマンスを示し、まばらな好みのクエリから強化されたポリシーの収束を提供します。
この評価は、単一のプロットでモバイルロボットの完全なシーンナビゲーション動作をキャプチャする、新しい説明可能性表現によって補完されます。
私たちのコードは、https://github.com/hrl-bonn/EnQuery からオンラインで入手できます。

要約(オリジナル)

To align mobile robot navigation policies with user preferences through reinforcement learning from human feedback (RLHF), reliable and behavior-diverse user queries are required. However, deterministic policies fail to generate a variety of navigation trajectory suggestions for a given navigation task. In this paper, we introduce EnQuery, a query generation approach using an ensemble of policies that achieve behavioral diversity through a regularization term. For a given navigation task, EnQuery produces multiple navigation trajectory suggestions, thereby optimizing the efficiency of preference data collection with fewer queries. Our methodology demonstrates superior performance in aligning navigation policies with user preferences in low-query regimes, offering enhanced policy convergence from sparse preference queries. The evaluation is complemented with a novel explainability representation, capturing full scene navigation behavior of the mobile robot in a single plot. Our code is available online at https://github.com/hrl-bonn/EnQuery.

arxiv情報

著者	Jorge de Heuvel,Florian Seiler,Maren Bennewitz
発行日	2024-06-11 10:15:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー