Synergistic Weak-Strong Collaboration by Aligning Preferences

要約

現在の大規模な言語モデル（LLM）は、一般的な推論で優れていますが、独自またはドメイン固有の知識を必要とする特殊なタスクと格闘しています。
ニッチアプリケーションごとに微調整する大きなモデルは、ブラックボックスの制約と高い計算オーバーヘッドのために、しばしば実行不可能です。
これに対処するために、特殊な弱いモデルと一般的な強力なモデルを組み合わせた共同フレームワークを提案します。
特定のドメインに合わせた弱いモデルは、初期ドラフトと背景情報を生成しますが、強力なモデルはこれらのドラフトを改良するための高度な推論を活用し、LLMSの機能を重要でありながら専門的なタスクに拡張します。
このコラボレーションを最適化するために、弱いモデルを微調整するためのコラボレーションフィードバックを紹介します。これは、コラボレーション手順における弱いモデルの貢献の影響を定量化し、弱いモデルの優先チューニングを導く優先ペアを確立します。
3つのドメインでの実験を通じてフレームワークを検証します。
コラボレーションは、相補的な強さを活用することにより、各モデルのみを大幅に上回ることがわかります。
さらに、弱いモデルを共同設定に合わせると、全体的なパフォーマンスがさらに向上します。

要約(オリジナル)

Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge. Fine-tuning large models for every niche application is often infeasible due to black-box constraints and high computational overhead. To address this, we propose a collaborative framework that pairs a specialized weak model with a general strong model. The weak model, tailored to specific domains, produces initial drafts and background information, while the strong model leverages its advanced reasoning to refine these drafts, extending LLMs’ capabilities to critical yet specialized tasks. To optimize this collaboration, we introduce a collaborative feedback to fine-tunes the weak model, which quantifies the influence of the weak model’s contributions in the collaboration procedure and establishes preference pairs to guide preference tuning of the weak model. We validate our framework through experiments on three domains. We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths. Moreover, aligning the weak model with the collaborative preference further enhances overall performance.

arxiv情報

著者	Yizhu Jiao,Xuchao Zhang,Zhaoyang Wang,Yubo Ma,Zhun Deng,Rujia Wang,Chetan Bansal,Saravan Rajmohan,Jiawei Han,Huaxiu Yao
発行日	2025-04-22 04:22:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Synergistic Weak-Strong Collaboration by Aligning Preferences

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー