ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

要約

大規模な言語モデル（LLMS）の最近の進歩により、多様なタスク全体で顕著なパフォーマンスが示されています。
ただし、これらのモデルは通常、固定重量で展開されているため、推論中に実際のデータに固有の変動性に動的に適応する能力が制限されます。
このペーパーでは、バッチアウェアクラスタリングとオンザフライ生成の低ランクアップデートを活用することにより、LLMSの推論時間適応を可能にする新しいフレームワークであるChamaleonllmを紹介します。
低ランク適応（LORA）や、前習得前のユニフォーム（変更可能なマスク）の固定セットに依存する方法などの従来の微調整アプローチとは異なり、この方法は、クラスター化された統計統計に基づいてデコーダー重みの適応修正を動的に生成します。
バッチ。
Hyper-Networkを介して同様の入力をインテリジェントにグループ化し、コンテキストを認識している低ランクの更新を計算することにより、Chamaleonllmは大幅なパフォーマンスゲインを達成し、複数の専門家モデルを維持するオーバーヘッドを排除しながら、従来のLORAメソッドを上回ります。
私たちの実験は、言語モデルの推論のための多用途で高度に適応的なソリューションとして機能するアプローチの可能性を強調しています。
ChamaleOnllmは、実験の再現性を確保するためにオープンソースを受けています：https：//anonymous.4open.science/r/chamaleonllm/

要約(オリジナル)

Recent advances in large language models (LLMs) have shown remarkable performance across diverse tasks. However, these models are typically deployed with fixed weights, which limits their ability to adapt dynamically to the variability inherent in real-world data during inference. This paper introduces ChamaleonLLM, a novel framework that enables inference-time adaptation of LLMs by leveraging batch-aware clustering and on-the-fly generation of low-rank updates. Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA) or methods that rely on a fixed set of pre-learned uniforms (changeable masks), our method dynamically generates adaptive modifications to the decoder weights based on the aggregated statistics of clustered batches. By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChamaleonLLM achieves significant performance gains, outperforming conventional LoRA methods while eliminating the overhead of maintaining multiple expert models. Our experiments highlight the potential of our approach to serve as a versatile and highly adaptive solution for language model inference. ChamaleonLLM is open-sourced to ensure the reproducibility of our experiments: https://anonymous.4open.science/r/ChamaleonLLM/

arxiv情報

著者	Kamer Ali Yuksel,Hassan Sawaf
発行日	2025-02-06 18:57:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ChamaleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー