Intriguing Properties of Data Attribution on Diffusion Models

要約

データアトリビューションでは、モデルの出力をトレーニングデータまで追跡しようとします。
最近の普及モデルの発展により、データの帰属は、高品質のトレーニングサンプルや著作権で保護されたトレーニングサンプルに適切に評価を割り当て、データの貢献者に公正な報酬やクレジットが与えられるようにするための望ましいモジュールとなっています。
計算のスケーラビリティと有効性の間のトレードオフを改善するために、データアトリビューションを実装するための理論的に動機付けられたいくつかの方法が提案されています。
この研究では、特に CIFAR-10 および CelebA でトレーニングされた DDPM と、ArtBench で LoRA 微調整された安定拡散モデルに焦点を当てた、帰属拡散モデルに関する広範な実験とアブレーション研究を実施します。
興味深いことに、我々は、理論的に不当なアトリビューションの設計選択が、線形データモデリングスコアと反事実評価の両方の点で、経験的に以前のベースラインを大幅に上回るという直感に反する観察結果を報告しました。
私たちの研究は、拡散モデルを帰属させるための大幅に効率的なアプローチを示していますが、予期せぬ発見は、少なくとも非凸設定では、理論的仮定に基づいた構築が帰属パフォーマンスの低下につながる可能性があることを示唆しています。
コードは https://github.com/sail-sg/D-TRAK で入手できます。

要約(オリジナル)

Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.

arxiv情報

著者	Xiaosen Zheng,Tianyu Pang,Chao Du,Jing Jiang,Min Lin
発行日	2024-03-15 12:05:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Intriguing Properties of Data Attribution on Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー