MUSCLE: A Model Update Strategy for Compatible LLM Evolution

要約

大規模言語モデル (LLM) は、パフォーマンスを向上させるためのデータまたはアーキテクチャの変更により頻繁に更新されます。
モデルを更新するとき、開発者は多くの場合、以前のモデルのバージョンとの互換性にはあまり重点を置かず、全体的なパフォーマンス指標を向上させることに重点を置きます。
ただし、ユーザーは対話している特定の機械学習モデルの機能と能力のメンタルモデルを構築することがよくあります。
更新のたびにメンタルモデルを適応させる必要がありますが、これはユーザーの不満につながる可能性のある骨の折れる作業です。
実際には、微調整されたダウンストリームタスクアダプターは、事前トレーニングされた LLM ベースモデルに依存します。
これらの基本モデルが更新されると、これらのユーザー向けの下流タスクモデルではインスタンスの回帰または負の反転が発生します。以前は正しかったインスタンスが誤って予測されるようになります。
これは、下流のタスクのトレーニング手順が同じままである場合でも発生します。
私たちの取り組みは、2 つの方法でユーザーにシームレスなモデル更新を提供することを目的としています。
まず、以前のモデルバージョンとの互換性の概念に関する評価メトリクスを提供します。特に生成タスクに適用されますが、識別タスクにも適用できます。
さまざまなタスクやモデルの更新において、異なるモデルバージョン間の回帰や不一致が観察されます。
2 番目に、タスクの微調整された言語モデルを強化できる互換性モデルのトレーニングを含む、モデルの更新における不一致の数を最小限に抑えるトレーニング戦略を提案します。
ラマ 1 からラマ 2 までは、ネガティブフリップ (以前のモデルバージョンは正しかったが、新しいモデルが不正確だった場合) を最大 40% 削減しました。

要約(オリジナル)

Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with. They have to adapt their mental model with every update — a draining task that can lead to user dissatisfaction. In practice, fine-tuned downstream task adapters rely on pretrained LLM base models. When these base models are updated, these user-facing downstream task models experience instance regression or negative flips — previously correct instances are now predicted incorrectly. This happens even when the downstream task training procedures remain identical. Our work aims to provide seamless model updates to a user in two ways. First, we provide evaluation metrics for a notion of compatibility to prior model versions, specifically for generative tasks but also applicable for discriminative tasks. We observe regression and inconsistencies between different model versions on a diverse set of tasks and model updates. Second, we propose a training strategy to minimize the number of inconsistencies in model updates, involving training of a compatibility model that can enhance task fine-tuned language models. We reduce negative flips — instances where a prior model version was correct, but a new model incorrect — by up to 40% from Llama 1 to Llama 2.

arxiv情報

著者	Jessica Echterhoff,Fartash Faghri,Raviteja Vemulapalli,Ting-Yao Hu,Chun-Liang Li,Oncel Tuzel,Hadi Pouransari
発行日	2024-07-12 17:12:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MUSCLE: A Model Update Strategy for Compatible LLM Evolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー