Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels

要約

強化学習 (RL) の最近の進歩では、意思決定にニューラルネットワークベースのポリシーが主に活用されていますが、これらのモデルには解釈可能性が欠けていることが多く、関係者の理解と信頼に課題が生じています。
コンセプトボトルネックモデルは、人間が理解できるコンセプトをニューラルネットワークに統合することで、解釈可能な代替案を提供します。
ただし、これまでの研究における重大な制限は、これらの概念に対する人間によるアノテーションがトレーニング中にすぐに利用できるという前提であり、人間によるアノテーターからの継続的なリアルタイム入力が必要でした。
この制限を克服するために、RL アルゴリズムが、小規模なデータセットにラベルを付けるよう人間にクエリを実行するだけで、または極端な場合には人間によるラベルをまったく使用せずに、RL アルゴリズムが概念ベースのポリシーを効率的に学習できるようにする新しいトレーニングスキームを導入します。
私たちのアルゴリズム LICORICE には、概念学習と RL トレーニングのインターリーブ、概念アンサンブルを使用したラベル付けのための有益なデータポイントの積極的な選択、および単純な戦略による概念データの無相関化という 3 つの主な貢献が含まれています。
LICORICE が 3 つの環境で手動のラベル付け作業を 500 以下のコンセプトラベルにどのように削減するかを示します。
最後に、強力なビジョン言語モデルを使用して、パフォーマンスへのコストを最小限に抑えながら、明示的なラベルなしで生の視覚入力から概念を推論する方法を探るための初期研究を紹介します。

要約(オリジナル)

Recent advances in reinforcement learning (RL) have predominantly leveraged neural network-based policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into neural networks. However, a significant limitation in prior work is the assumption that human annotations for these concepts are readily available during training, necessitating continuous real-time input from human annotators. To overcome this limitation, we introduce a novel training scheme that enables RL algorithms to efficiently learn a concept-based policy by only querying humans to label a small set of data, or in the extreme case, without any human labels. Our algorithm, LICORICE, involves three main contributions: interleaving concept learning and RL training, using a concept ensembles to actively select informative data points for labeling, and decorrelating the concept data with a simple strategy. We show how LICORICE reduces manual labeling efforts to to 500 or fewer concept labels in three environments. Finally, we present an initial study to explore how we can use powerful vision-language models to infer concepts from raw visual inputs without explicit labels at minimal cost to performance.

arxiv情報

著者	Zhuorui Ye,Stephanie Milani,Geoffrey J. Gordon,Fei Fang
発行日	2024-07-22 16:46:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー