Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

要約

大規模な言語モデル（LLMS）をツールと統合する最近の進歩により、モデルは実際の環境と対話することができました。
ただし、ユーザーが部分的な情報を提供する場合、または必要なツールが利用できない場合、これらのツールを介したLLMは不完全なシナリオに遭遇することがよくあります。
このようなシナリオを認識して管理することは、LLMSが信頼性を確保するために重要ですが、この探索は依然として理解されています。
この研究では、LLMが不完全な条件を特定できるかどうかを調べ、ツールの使用をいつ控えるかを適切に決定します。
この目的のために、2つのデータセットからインスタンスを操作することにより、必要なツールまたはツールの呼び出しに不可欠な情報を削除することにより、データセットに対処します。
私たちの実験は、LLMが特定のツールを利用するために必要な情報の欠如を特定し、適切なツールがないことを認識するのに苦労していることを示しています。
さまざまな環境でのモデルの動作をさらに分析し、そのパフォーマンスを人間と比較します。
私たちの研究は、人間とLLM間の相互作用中に一般的なシナリオに対処することにより、信頼できるLLMの前進に貢献できます。
コードとデータセットは公開されます。

要約(オリジナル)

Recent advancements in integrating large language models (LLMs) with tools have allowed the models to interact with real-world environments. However, these tool-augmented LLMs often encounter incomplete scenarios when users provide partial information or the necessary tools are unavailable. Recognizing and managing such scenarios is crucial for LLMs to ensure their reliability, but this exploration remains understudied. This study examines whether LLMs can identify incomplete conditions and appropriately determine when to refrain from using tools. To this end, we address a dataset by manipulating instances from two datasets by removing necessary tools or essential information for tool invocation. Our experiments show that LLMs often struggle to identify the absence of information required to utilize specific tools and recognize the absence of appropriate tools. We further analyze model behaviors in different environments and compare their performance against humans. Our research can contribute to advancing reliable LLMs by addressing common scenarios during interactions between humans and LLMs. Our code and dataset will be publicly available.

arxiv情報

著者	Seungbin Yang,ChaeHun Park,Taehee Kim,Jaegul Choo
発行日	2025-04-18 13:07:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー