Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

要約

事前トレーニングデータの構成は基礎モデルのパフォーマンスの重要な決定要因ですが、限られた計算予算をさまざまなデータソースに割り当てるための標準的なガイドラインはありません。
現在のアプローチのほとんどは、小規模なモデルを使用した大規模な実験か、プロキシモデルも必要とする動的なデータ調整に依存しており、どちらもワークフローの複雑さと計算オーバーヘッドを大幅に増加させます。
このペーパーでは、モデルのトレーニングと同時にオンライン形式でデータ分布を最適化するアルゴリズムである Adaptive Data Optimization (ADO) を紹介します。
既存の手法とは異なり、ADO は外部の知識、プロキシモデル、またはモデル更新への変更を必要としません。
代わりに、ADO はドメインごとのスケーリング則を使用してトレーニング中に各ドメインの学習可能性を推定し、それに応じてデータ混合を調整することで、よりスケーラブルで統合が容易になります。
実験では、ADO がさまざまな計算スケールにわたって計算効率を維持しながら、従来の方法と同等以上のパフォーマンスを達成できることが実証されており、柔軟性を犠牲にしたりコストを増加させたりすることなくデータ分散を動的に調整するための実用的なソリューションを提供します。
ADO は、実際的な利点を超えて、スケーリング則を通じてデータ収集戦略に新しい視点を提供します。

要約(オリジナル)

The composition of pretraining data is a key determinant of foundation models’ performance, but there is no standard guideline for allocating a limited computational budget across different data sources. Most current approaches either rely on extensive experiments with smaller models or dynamic data adjustments that also require proxy models, both of which significantly increase the workflow complexity and computational overhead. In this paper, we introduce Adaptive Data Optimization (ADO), an algorithm that optimizes data distributions in an online fashion, concurrent with model training. Unlike existing techniques, ADO does not require external knowledge, proxy models, or modifications to the model update. Instead, ADO uses per-domain scaling laws to estimate the learning potential of each domain during training and adjusts the data mixture accordingly, making it more scalable and easier to integrate. Experiments demonstrate that ADO can achieve comparable or better performance than prior methods while maintaining computational efficiency across different computation scales, offering a practical solution for dynamically adjusting data distribution without sacrificing flexibility or increasing costs. Beyond its practical benefits, ADO also provides a new perspective on data collection strategies via scaling laws.

arxiv情報

著者	Yiding Jiang,Allan Zhou,Zhili Feng,Sadhika Malladi,J. Zico Kolter
発行日	2024-10-15 17:47:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー