Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

要約

基礎モデルのトレーニングの現在の状況では、パブリックドメインのデータに大きく依存しており、最近の調査によると、データは枯渇に近づいています。
さらにスケールアップするには、複数の専門化された高品質のプライベートドメインデータソース間のコラボレーションを組み込むことが重要です。
ただし、プライベートデータを共有せずにモデルをローカルでトレーニングするという課題には、データ品質管理において多くの障害が生じます。
この問題に取り組むために、基礎モデルのフェデレーテッド微調整のためのデータ品質管理パイプラインを提案します。
このパイプラインは、トレーニングデータの品質を反映するスコアを計算し、統一標準のグローバルしきい値を決定して、グローバルなパフォーマンスの向上を目指します。
私たちの実験は、提案された品質管理パイプラインがモデルトレーニングの有効性と信頼性を促進し、パフォーマンスの向上につながることを示しています。

要約(オリジナル)

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research. To further scale up, it is crucial to incorporate collaboration among multiple specialized and high-quality private domain data sources. However, the challenge of training models locally without sharing private data presents numerous obstacles in data quality control. To tackle this issue, we propose a data quality control pipeline for federated fine-tuning of foundation models. This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard, aiming for improved global performance. Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.

arxiv情報

著者	Wanru Zhao,Yaxin Du,Nicholas Donald Lane,Siheng Chen,Yanfeng Wang
発行日	2024-03-07 14:28:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー