Approximating Language Model Training Data from Weights

要約

現代の言語モデルは、しばしばオープンウェイトを持っていますが、トレーニングデータを閉じています。
モデルの重みからのデータ近似の問題を形式化し、いくつかのベースラインとメトリックを提案します。
大規模なパブリックテキストコーパスから最高の一致データを選択し、元のモデルと糸状モデルの重みのみを考慮して有用なデータを回復するという有効性を示すグラデーションベースのアプローチを開発します。
真のトレーニングデータのいずれも知られていない場合でも、私たちの方法では、パブリックWebドキュメントの小さなサブセットを見つけることができます。分類と監視施設の両方でトレーニングされたモデルを考慮して、元のモデルパフォーマンスに近づくためにモデルをトレーニングできます。
AGニュース分類タスクでは、この方法により、パフォーマンスが65％（ランダムに選択されたデータを使用）から80％に向上し、88％の専門家ベンチマークに近づきます。
MSMARCO WebドキュメントでSFTで訓練されたモデルに適用されると、この方法は、2.0の専門家モデルの困惑と比較して、困惑を3.3から2.3に減らします。

要約(オリジナル)

Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering useful data given only weights of the original and finetuned models. Even when none of the true training data is known, our method is able to locate a small subset of public Web documents can be used to train a model to close to the original model performance given models trained for both classification and supervised-finetuning. On the AG News classification task, our method improves performance from 65% (using randomly selected data) to 80%, approaching the expert benchmark of 88%. When applied to a model trained with SFT on MSMARCO web documents, our method reduces perplexity from 3.3 to 2.3, compared to an expert LLAMA model’s perplexity of 2.0.

arxiv情報

著者	John X. Morris,Junjie Oscar Yin,Woojeong Kim,Vitaly Shmatikov,Alexander M. Rush
発行日	2025-06-18 15:26:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Approximating Language Model Training Data from Weights

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー