Do Concept Replacement Techniques Really Erase Unacceptable Concepts?

要約

生成モデル、特に拡散ベースのテキストからイメージ（T2I）モデルは、驚くべき成功を示しています。
ただし、受け入れられない概念（攻撃的または著作権で保護されたコンテンツ、または有名人の類似性などの概念を持つコンテンツの生成を避けるためにそれらを調整することは依然として重要な課題です。
概念置換技術（CRTS）は、モデルから受け入れられない概念を「消去」しようとすることにより、この課題に対処することを目的としています。
最近、モデルプロバイダーは、画像とテキストプロンプトを入力として受け入れる画像編集サービスの提供を開始し、プロンプトで指定されたように変更された画像を作成します。
これらは、画像からイメージ（I2I）モデルとして知られています。
この論文では、最初にI2Iモデルを使用して、今日の最新のCRTが実際に容認できない概念を消去しないことを経験的に実証します。
したがって、既存のCRTは、T2Iパイプラインで不要な概念を削除する実証済みの能力にもかかわらず、新たなI2Iシナリオでは効果がない可能性が高く、T2IとI2Iの設定間のこの矛盾を理解する必要性を強調しています。
次に、適切なCRTは、容認できない概念を置き換えますが、入力で指定された他の概念を生成モデルに保存する必要があると主張します。
これを忠実に呼びます。
CRTの以前の研究は、容認できない概念の場合、忠実度を無視してきました。
最後に、有効性と忠実度の両方を達成するために、ターゲットを絞った画像編集技術の使用を提案します。
私たちはそのような技術と反イマージャーを提示し、その生存率を示します。

要約(オリジナル)

Generative models, particularly diffusion-based text-to-image (T2I) models, have demonstrated astounding success. However, aligning them to avoid generating content with unacceptable concepts (e.g., offensive or copyrighted content, or celebrity likenesses) remains a significant challenge. Concept replacement techniques (CRTs) aim to address this challenge, often by trying to ‘erase’ unacceptable concepts from models. Recently, model providers have started offering image editing services which accept an image and a text prompt as input, to produce an image altered as specified by the prompt. These are known as image-to-image (I2I) models. In this paper, we first use an I2I model to empirically demonstrate that today’s state-of-the-art CRTs do not in fact erase unacceptable concepts. Existing CRTs are thus likely to be ineffective in emerging I2I scenarios, despite their proven ability to remove unwanted concepts in T2I pipelines, highlighting the need to understand this discrepancy between T2I and I2I settings. Next, we argue that a good CRT, while replacing unacceptable concepts, should preserve other concepts specified in the inputs to generative models. We call this fidelity. Prior work on CRTs have neglected fidelity in the case of unacceptable concepts. Finally, we propose the use of targeted image-editing techniques to achieve both effectiveness and fidelity. We present such a technique, AntiMirror, and demonstrate its viability.

arxiv情報

著者	Anudeep Das,Gurjot Singh,Prach Chantasantitam,N. Asokan
発行日	2025-06-10 17:02:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Concept Replacement Techniques Really Erase Unacceptable Concepts?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー