Self RefineやMulti-Agent Debateなどの複数のLLM呼び出しを組み合わせた化合物AIシステムは、多くのAIタスクで強力なパフォーマンスを実現します。
GPT-4O、Claude 3.5 Sonnet、Gemini 1.5などのLLMを使用して、マルチエージェントの討論や自己強化などの一般的な化合物システムを使用した実験は、LLMSelectorがすべてのモジュールで同じLLMを使用するのと比較して5%-70%の精度ゲインを付与することを示しています。
Compound AI systems that combine multiple LLM calls, such as self-refine and multi-agent-debate, achieve strong performance on many AI tasks. We address a core question in optimizing compound systems: for each LLM call or module in the system, how should one decide which LLM to use? We show that these LLM choices have a large effect on quality, but the search space is exponential. We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM. Building upon these insights, LLMSelector iteratively selects one module and allocates to it the model with the highest module-wise performance, as estimated by an LLM, until no further gain is possible. LLMSelector is applicable to any compound system with a bounded number of modules, and its number of API calls scales linearly with the number of modules, achieving high-quality model allocation both empirically and theoretically. Experiments with popular compound systems such as multi-agent debate and self-refine using LLMs such as GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 show that LLMSelector confers 5%-70% accuracy gains compared to using the same LLM for all modules.
著者 | Lingjiao Chen,Jared Quincy Davis,Boris Hanin,Peter Bailis,Matei Zaharia,James Zou,Ion Stoica |
発行日 | 2025-02-20 18:36:25+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, Google