ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

要約

ツール学習の既存の評価は主に、大規模言語モデル (LLM) 用に選択されたツールが期待される結果と整合しているかどうかを検証することに重点を置いています。
ただし、これらのアプローチは、真のニーズから逸脱した、答えが事前に決定できる限られた一連のシナリオに依存しています。
さらに、結果のみを重視すると、LLM がツールを効果的に活用するために不可欠な複雑な機能が無視されます。
この問題に取り組むために、私たちは、実際のシナリオにおける LLM のツール学習機能の評価に合わせて調整されたきめ細かいシステムである ToolEyes を提案します。
このシステムは、7 つの現実世界のシナリオを細心の注意を払って調査し、フォーマットの調整、意図の理解、行動計画、ツールの選択、回答の構成というツール学習における LLM にとって重要な 5 つの側面を分析します。
さらに、ToolEyes には、LLM と物理世界の間の仲介者として機能する、約 600 のツールを誇るツールライブラリが組み込まれています。
3 つのカテゴリにわたる 10 個の LLM を対象とした評価では、特定のシナリオを好み、ツール学習における認知能力が限られていることが明らかになりました。
興味深いことに、モデルのサイズを拡大すると、ツール学習の障害がさらに悪化します。
これらの発見は、ツール学習の分野を進歩させることを目的とした有益な洞察を提供します。
データは https://github.com/Junjie-Ye/ToolEyes.git から入手できます。

要約(オリジナル)

Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the intricate capabilities essential for LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs’ tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. These findings offer instructive insights aimed at advancing the field of tool learning. The data is available att https://github.com/Junjie-Ye/ToolEyes.git.

arxiv情報

著者	Junjie Ye,Guanyu Li,Songyang Gao,Caishuang Huang,Yilong Wu,Sixian Li,Xiaoran Fan,Shihan Dou,Qi Zhang,Tao Gui,Xuanjing Huang
発行日	2024-01-01 12:49:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー