Preemptive Detection and Correction of Misaligned Actions in LLM Agents

要約

LLM ベースのエージェントを実際のアプリケーションに導入すると、多くの場合、エージェントの動作とユーザーの意図の間のずれという重大な課題に直面します。
このような調整のずれにより、エージェントがマイナスの結果をもたらす重要なアクションを意図せず実行してしまう可能性があり (例: Web ショッピングで誤って「今すぐ購入」をトリガーするなど)、望ましくない、または取り返しのつかない結果を招く可能性があります。
これらの問題に対処することは非常に重要ですが、ずれたアクションを事前に検出して修正することは比較的研究されていません。
このギャップを埋めるために、心の理論に基づいた LLM の信念推論能力を活用し、不整合なアクションを実行前に検出する新しいアプローチである InferAct を紹介します。
ずれが検出されると、InferAct はユーザーに適時に修正するよう警告し、不利な結果を防ぎ、LLM エージェントの意思決定プロセスの信頼性を高めます。
広く使用されている 3 つのタスクに関する実験では、InferAct が不整合アクション検出のベースラインに対して Marco-F1 で最大 20% の改善を達成することを実証しています。
位置ずれ補正の詳細な評価により、エージェントの位置合わせの改善における InferAct の有効性がさらに強調されます。

要約(オリジナル)

Deploying LLM-based agents in real-life applications often faces a critical challenge: the misalignment between agents’ behavior and user intent. Such misalignment may lead agents to unintentionally execute critical actions that carry negative outcomes (e.g., accidentally triggering a ‘buy-now’ in web shopping), resulting in undesirable or even irreversible consequences. Although addressing these issues is crucial, the preemptive detection and correction of misaligned actions remains relatively underexplored. To fill this gap, we introduce InferAct, a novel approach that leverages the belief reasoning ability of LLMs, grounded in Theory-of-Mind, to detect misaligned actions before execution. Once the misalignment is detected, InferAct alerts users for timely correction, preventing adverse outcomes and enhancing the reliability of LLM agents’ decision-making processes. Experiments on three widely used tasks demonstrate that InferAct achieves up to 20% improvements on Marco-F1 against baselines in misaligned action detection. An in-depth evaluation of misalignment correction further highlights InferAct’s effectiveness in improving agent alignment.

arxiv情報

著者	Haishuo Fang,Xiaodan Zhu,Iryna Gurevych
発行日	2024-12-27 14:17:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Preemptive Detection and Correction of Misaligned Actions in LLM Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー