FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics

要約

事前トレーニング済み言語モデル (PLM) の微調整は大成功を収めましたが、配布範囲外の入力の影響を受けやすいままです。
データセットカートグラフィーは、微調整された PLM の堅牢性を向上させる、シンプルかつ効果的なデュアルモデルアプローチです。
これには、元のトレーニングセット (つまり、参照モデル) でモデルを微調整し、トレーニングダイナミクスに基づいて重要なトレーニングインスタンスのサブセットを選択し、これらの選択された例 (つまり、メインモデル) のみで再度微調整することが含まれます。
ただし、このアプローチでは同じモデルを 2 回微調整する必要があり、大規模な PLM では計算コストが高くなります。
この論文では、(1) トレーニングダイナミクスはモデルサイズや事前トレーニング方法間で高度に移行可能であること、(2) これらの選択されたトレーニングインスタンスを使用してメインモデルを微調整すると、経験的リスク最小化 (ERM) よりも高いトレーニング効率が達成されることを示します。
。
これらの観察に基づいて、我々は新しい微調整アプローチ、すなわち、transFerring Training Dynamics による微調整 (FTFT) を提案します。
データセット地図作成と比較して、FTFT はより効率的な参照モデルと積極的な早期停止を使用します。
FTFT は、トレーニングコストを最大 $\sim 50\%$ 削減しながら、ERM よりも堅牢性の向上を実現します。

要約(オリジナル)

Despite the massive success of fine-tuning Pre-trained Language Models (PLMs), they remain susceptible to out-of-distribution input. Dataset cartography is a simple yet effective dual-model approach that improves the robustness of fine-tuned PLMs. It involves fine-tuning a model on the original training set (i.e. reference model), selecting a subset of important training instances based on the training dynamics, and fine-tuning again only on these selected examples (i.e. main model). However, this approach requires fine-tuning the same model twice, which is computationally expensive for large PLMs. In this paper, we show that (1) training dynamics are highly transferable across model sizes and pre-training methods, and that (2) fine-tuning main models using these selected training instances achieves higher training efficiency than empirical risk minimization (ERM). Building on these observations, we propose a novel fine-tuning approach: Fine-Tuning by transFerring Training dynamics (FTFT). Compared with dataset cartography, FTFT uses more efficient reference models and aggressive early stopping. FTFT achieves robustness improvements over ERM while lowering the training cost by up to $\sim 50\%$.

arxiv情報

著者	Yupei Du,Albert Gatt,Dong Nguyen
発行日	2024-12-11 11:08:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー