T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

要約

大規模な言語モデル（LLM）は、複雑な問題を解決できるインテリジェントなエージェントとして印象的な能力を実証しています。
ただし、APIまたはツールコールの間の依存関係を含むシナリオでの効果的な計画は、特にマルチターン会話において重要な課題です。
これに対処するために、多様なドメイン全体でツール間依存関係をキャプチャおよび管理するために特別に設計された、ツールを得た多型、マルチターンの会話データセットであるT1を紹介します。
T1は、短期および長期メモリの両方の統合されたキャッシュメカニズムの助けを借りて、9つの異なるドメイン（4つの単一ドメインと5つのマルチドメイン）にわたってツールの使用を調整するエージェントの能力を厳密に評価できるようにし、動的リプランシングをサポートしながら、キャッシュされた結果を再計算するか復活させるかを決定します。
T1は、ツールの使用と計画に関する研究を促進するだけでなく、オープンソース言語モデルのパフォーマンスを評価するためのベンチマークとしても機能します。
T1-Agentを搭載した結果を提示し、複雑でツール依存のシナリオで計画と推論する能力を強調しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated impressive capabilities as intelligent agents capable of solving complex problems. However, effective planning in scenarios involving dependencies between API or tool calls-particularly in multi-turn conversations-remains a significant challenge. To address this, we introduce T1, a tool-augmented, multi-domain, multi-turn conversational dataset specifically designed to capture and manage inter-tool dependencies across diverse domains. T1 enables rigorous evaluation of agents’ ability to coordinate tool use across nine distinct domains (4 single domain and 5 multi-domain) with the help of an integrated caching mechanism for both short- and long-term memory, while supporting dynamic replanning-such as deciding whether to recompute or reuse cached results. Beyond facilitating research on tool use and planning, T1 also serves as a benchmark for evaluating the performance of open-source language models. We present results powered by T1-Agent, highlighting their ability to plan and reason in complex, tool-dependent scenarios.

arxiv情報

著者	Amartya Chakraborty,Paresh Dashore,Nadia Bathaee,Anmol Jain,Anirban Das,Shi-Xiong Zhang,Sambit Sahu,Milind Naphade,Genta Indra Winata
発行日	2025-05-22 17:54:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー