Optimizing Model Selection for Compound AI Systems

要約

Self RefineやMulti-Agent Debateなどの複数のLLM呼び出しを組み合わせた化合物AIシステムは、多くのAIタスクで強力なパフォーマンスを実現します。
複合システムの最適化におけるコア質問に対処します。システム内のLLMコールまたはモジュールごとに、どのLLMを使用するかをどのように決定すべきですか？
これらのLLMの選択が品質に大きな影響を与えるが、検索スペースは指数関数的であることを示します。
複合システムでのモデル選択の効率的なフレームワークであるLLMSelectorを提案します。これは、2つの重要な経験的洞察を活用します。（i）エンドツーエンドのパフォーマンスは、他のすべてのモジュールが固定された状態で、各モジュールのパフォーマンスがどれだけうまく機能するかについて単調であることがよくあります。
）モジュールあたりのパフォーマンスは、LLMによって正確に推定できます。
これらの洞察に基づいて、LLMSelectorは1つのモジュールを繰り返し選択し、LLMによって推定されるように、それ以上のゲインが不可能になるまで、最も高いモジュールごとのパフォーマンスでモデルを割り当てます。
LLMSelectorは、境界数のモジュールを持つ任意の化合物システムに適用でき、API呼び出しの数はモジュールの数と直線的にスケールし、経験的にも理論的にも高品質のモデル割り当てを達成します。
GPT-4O、Claude 3.5 Sonnet、Gemini 1.5などのLLMを使用して、マルチエージェントの討論や自己強化などの一般的な化合物システムを使用した実験は、LLMSelectorがすべてのモジュールで同じLLMを使用するのと比較して5％-70％の精度ゲインを付与することを示しています。

要約(オリジナル)

Compound AI systems that combine multiple LLM calls, such as self-refine and multi-agent-debate, achieve strong performance on many AI tasks. We address a core question in optimizing compound systems: for each LLM call or module in the system, how should one decide which LLM to use? We show that these LLM choices have a large effect on quality, but the search space is exponential. We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM. Building upon these insights, LLMSelector iteratively selects one module and allocates to it the model with the highest module-wise performance, as estimated by an LLM, until no further gain is possible. LLMSelector is applicable to any compound system with a bounded number of modules, and its number of API calls scales linearly with the number of modules, achieving high-quality model allocation both empirically and theoretically. Experiments with popular compound systems such as multi-agent debate and self-refine using LLMs such as GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 show that LLMSelector confers 5%-70% accuracy gains compared to using the same LLM for all modules.

arxiv情報

著者	Lingjiao Chen,Jared Quincy Davis,Boris Hanin,Peter Bailis,Matei Zaharia,James Zou,Ion Stoica
発行日	2025-02-20 18:36:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Optimizing Model Selection for Compound AI Systems

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー