Rationality based Innate-Values-driven Reinforcement Learning

要約

生得的価値観は、エージェントの内発的動機を表し、目標を追求するためのエージェントの固有の興味や好みを反映し、さまざまなニーズを満たす多様なスキルを開発するように促します。
強化学習 (RL) の本質は、自然のエージェントと同様に、報酬主導型の行動に基づく対話から学習することです。
これは、AI エージェントの生得的価値駆動 (IV) 行動を説明するための優れたモデルです。
特に、さまざまなタスクのニーズに基づいて内部ユーティリティと外部ユーティリティのバランスをとることによって AI エージェントの認識を高めることは、人間社会を長期的に安全と調和で統合する AI エージェントをサポートする方法を学ぶ個人にとって重要な問題です。
この論文では、AI エージェントの相互作用の複雑な動作を記述するための、階層型複合固有値強化学習モデル、つまり IVRL と呼ばれる固有値駆動型強化学習を提案します。
私たちは IVRL モデルを定式化し、DQN と A2C という 2 つの IVRL モデルを提案しました。
ロールプレイングゲーム (RPG) 強化学習テストプラットフォーム VIZDoom の DQN、DDQN、A2C、PPO などのベンチマークアルゴリズムと比較することで、個人のさまざまなニーズを合理的に整理することで効果的にパフォーマンスの向上が達成できることを実証しました。

要約(オリジナル)

Innate values describe agents’ intrinsic motivations, which reflect their inherent interests and preferences to pursue goals and drive them to develop diverse skills satisfying their various needs. The essence of reinforcement learning (RL) is learning from interaction based on reward-driven behaviors, much like natural agents. It is an excellent model to describe the innate-values-driven (IV) behaviors of AI agents. Especially developing the awareness of the AI agent through balancing internal and external utilities based on its needs in different tasks is a crucial problem for individuals learning to support AI agents integrating human society with safety and harmony in the long term. This paper proposes a hierarchical compound intrinsic value reinforcement learning model — innate-values-driven reinforcement learning termed IVRL to describe the complex behaviors of AI agents’ interaction. We formulated the IVRL model and proposed two IVRL models: DQN and A2C. By comparing them with benchmark algorithms such as DQN, DDQN, A2C, and PPO in the Role-Playing Game (RPG) reinforcement learning test platform VIZDoom, we demonstrated that rationally organizing various individual needs can effectively achieve better performance.

arxiv情報

著者	Qin Yang
発行日	2024-11-14 03:28:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rationality based Innate-Values-driven Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー