Interactive Agents to Overcome Ambiguity in Software Engineering

要約

AIエージェントは、多くの場合、曖昧で想定されていないユーザー命令に基づいて、タスクを自動化するためにますます展開されています。
不当な仮定を行い、明確な質問をしないと、最適ではない結果、ツールの誤用による安全リスク、および計算リソースの無駄につながる可能性があります。
この作業では、3つの重要なステップでパフォーマンスに関する独自およびオープンウェイトモデルを評価することにより、LLMエージェントがインタラクティブコード生成設定の曖昧な指示を処理する能力を研究します。
）あいまいさを検出し、（c）ターゲットを絞った質問をする。
私たちの調査結果は、モデルが明確に指定されており、統一されていない指示を区別するのに苦労していることを明らかにしています。
ただし、モデルが不足している入力に対して相互作用すると、ユーザーから重要な情報を効果的に取得し、パフォーマンスの大幅な改善と効果的な相互作用の価値を強調します。
私たちの研究は、現在の最新モデルが複雑なソフトウェアエンジニアリングのタスクのあいまいさをどのように処理するかについての重要なギャップを強調し、ターゲットの改善を可能にするために評価を明確なステップに構造化します。

要約(オリジナル)

AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes, safety risks due to tool misuse, and wasted computational resources. In this work, we study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance across three key steps: (a) leveraging interactivity to improve performance in ambiguous scenarios, (b) detecting ambiguity, and (c) asking targeted questions. Our findings reveal that models struggle to distinguish between well-specified and underspecified instructions. However, when models interact for underspecified inputs, they effectively obtain vital information from the user, leading to significant improvements in performance and underscoring the value of effective interaction. Our study highlights critical gaps in how current state-of-the-art models handle ambiguity in complex software engineering tasks and structures the evaluation into distinct steps to enable targeted improvements.

arxiv情報

著者	Sanidhya Vijayvargiya,Xuhui Zhou,Akhila Yerukola,Maarten Sap,Graham Neubig
発行日	2025-02-18 17:12:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Interactive Agents to Overcome Ambiguity in Software Engineering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー