Unveiling Concept Attribution in Diffusion Models

要約

拡散モデルは、テキストプロンプトからリアルで高品質な画像を生成する顕著な能力を示している。しかし、学習されたモデルはブラックボックス化されたままであり、オブジェクトやスタイルなどの概念を示す際の構成要素の役割についてはほとんど分かっていない。最近の研究では、生成モデル中の知識を格納するレイヤーを特定するために因果トレースを採用しているが、それらのレイヤーがターゲット概念にどのように寄与しているかを示していない。この研究では、より一般的な観点からモデルの解釈可能性の問題にアプローチし、問題を提起する：\textit{“モデルの構成要素は知識を示すためにどのように共同作業するのか？｝我々は、拡散モデルを分解するために構成要素の帰属を適応させ、構成要素がどのように概念に寄与するかを明らかにする。特に、他の概念に関する知識を残したまま、肯定的な構成要素を削除することで、拡散モデルから概念を消去することができる。驚くべきことに、知識局在化アプローチでは発見されなかった、概念に負に寄与する成分の存在も示す。実験結果は、我々のフレームワークによって特定されたポジティブ成分とネガティブ成分の役割を確認し、生成モデルの解釈の完全な見解を示す。私たちのコードは〚URL{https://github.com/mail-research/CAD-attribution4diffusion} 〛で入手可能である。

要約(オリジナル)

Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains black-box; little do we know about the role of its components in exhibiting a concept such as objects or styles. Recent works employ causal tracing to localize layers storing knowledge in generative models without showing how those layers contribute to the target concept. In this work, we approach the model interpretability problem from a more general perspective and pose a question: \textit{“How do model components work jointly to demonstrate knowledge?”}. We adapt component attribution to decompose diffusion models, unveiling how a component contributes to a concept. Our framework allows effective model editing, in particular, we can erase a concept from diffusion models by removing positive components while remaining knowledge of other concepts. Surprisingly, we also show there exist components that contribute negatively to a concept, which has not been discovered in the knowledge localization approach. Experimental results confirm the role of positive and negative components pinpointed by our framework, depicting a complete view of interpreting generative models. Our code is available at \url{https://github.com/mail-research/CAD-attribution4diffusion}

arxiv情報

著者	Quang H. Nguyen,Hoang Phan,Khoa D. Doan
発行日	2024-12-03 16:34:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Unveiling Concept Attribution in Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー