Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model

要約

APIコール機能を備えた大規模な言語モデル（LLMS）により、効果的な言語エージェント（LA）の構築が可能になり、従来のタスク指向の対話（TOD）パラダイムに革命をもたらしました。
ただし、現在のアプローチは重大なジレンマに直面しています。TODシステムは、限られたターゲットAPIのセットでトレーニングされることがよく、新しいサービスとインターフェースするときに品質を維持するために新しいデータが必要になりますが、LAはマルチターン会話を介してユーザーの意図を維持するように訓練されていません。
堅牢なマルチターン管理と高度な関数呼び出しの両方が効果的な会話エージェントにとって重要であるため、これらのスキルを3つの一般的なベンチマークで評価します：MultiWoz 2.4（TOD）、BFCL V3（LA）、およびAPI-Bank（LA）、および分析
特殊なアプローチが1つのドメインで優れているが、もう一方のドメインではパフォーマンスが低いことを明らかにします。
この割れ目を橋渡しするために、会話能力とエージェント機能の両方を統合する統合されたアプローチであるCoalm（会話エージェント言語モデル）を紹介します。
Coalm-ITを作成しました。これは、複雑なAPI使用量を備えたマルチターン反応推論をインターリーブする慎重に構築されたマルチタスクデータセットを作成しました。
Coalm-ITを使用して、3つのベンチマークにわたってGPT-4Oを含むトップドメイン固有のモデルを上回る3つのモデルCoalm 8B、Coalm 70B、およびCoalm 405Bをトレーニングします。
LA、会話エージェントの新しい基準を設定します。

要約(オリジナル)

Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA), and our analyses reveal that specialized approaches excel in one domain but underperform in the other. To bridge this chasm, we introduce CoALM (Conversational Agentic Language Model), a unified approach that integrates both conversational and agentic capabilities. We created CoALM-IT, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage. Using CoALM-IT, we train three models CoALM 8B, CoALM 70B, and CoALM 405B, which outperform top domain-specific models, including GPT-4o, across all three benchmarks.This demonstrates the feasibility of a single model approach for both TOD and LA, setting a new standard for conversational agents.

arxiv情報

著者	Emre Can Acikgoz,Jeremiah Greer,Akul Datta,Ze Yang,William Zeng,Oussama Elachqar,Emmanouil Koukoumidis,Dilek Hakkani-Tür,Gokhan Tur
発行日	2025-02-18 18:08:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー