DEM: Distribution Edited Model for Training with Mixed Data Distributions

要約

混合データ分布を使用したトレーニングは、マルチタスクおよび命令追従モデルを作成する際の一般的かつ重要な部分です。
データ分布の多様性と共同トレーニングのコストにより、最適化手順は非常に困難になります。
データ混合手法はこの問題に部分的に対処しますが、データソース全体でのパフォーマンスが最適とは言えず、高価なトレーニングを複数回実行する必要があります。
このペーパーでは、基本的な要素ごとのベクトル演算を使用して、各データソースで個別にトレーニングされたモデルとベースモデルを組み合わせることで、データソースをより適切に最適化するためのシンプルで効率的な代替案を提案します。
結果として得られるモデル、つまり配布編集モデル (DEM) は、標準的なデータ混合よりも 11 倍安価で、さまざまなベンチマークで強力なベースラインを上回り、MMLU で最大 6.2%、BBH で 11.5%、DROP で 16.1%、
MathQA、サイズ 3B ～ 13B のモデルの HELM では 9.3%。
特に、DEM は単一のデータソースを変更するときに完全な再トレーニングを必要としないため、多様なデータソースを使用したトレーニングに対して非常に柔軟でスケーラブルになります。

要約(オリジナル)

Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs. In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations. The resulting model, namely Distribution Edited Model (DEM), is 11x cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding upto 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, 6% on MathQA, and 9.3% on HELM with models of size 3B to 13B. Notably, DEM does not require full re-training when modifying a single data-source, thus making it very flexible and scalable for training with diverse data sources.

arxiv情報

著者	Dhananjay Ram,Aditya Rawal,Momchil Hardalov,Nikolaos Pappas,Sheng Zha
発行日	2024-11-05 11:40:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DEM: Distribution Edited Model for Training with Mixed Data Distributions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー