Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

要約

CLIP などの大規模な視覚および言語モデルは、通常、Web スケールのデータでトレーニングされるため、不適切なコンテンツが導入され、安全でない偏った動作が発生する可能性があります。
これにより、機密性が高く信頼できる状況での適用が妨げられ、その採用において重大な懸念が生じる可能性があります。
私たちの研究では、NSFW (作業には安全ではない) 入力に対する感度を低下させることで、視覚および言語モデルの安全性を高める新しいアプローチを導入しています。
特に、私たちの方法論は、「有害な」言語概念と視覚概念を切断し、安全でない言語または視覚アイテムと埋め込み空間の安全でない領域との間のつながりを学習しないことを目指しています。
安全な文と安全でない文の間で変換するように訓練された大規模な言語モデルと、テキストから画像へのジェネレーターから取得した合成データに基づいて CLIP モデルを微調整することで、これをどのように行うことができるかを示します。
私たちは、クロスモーダル検索、テキストから画像へ、および画像からテキストへの生成のための結果として得られる埋め込み空間について広範な実験を行い、その中で私たちのモデルが事前トレーニング済みの生成モデルで顕著に使用できることを示します。
ソースコードとトレーニング済みモデルは、https://github.com/aimagelab/safe-clip から入手できます。

要約(オリジナル)

Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concerns in their adoption. Our research introduces a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. In particular, our methodology seeks to sever ‘toxic’ linguistic and visual concepts, unlearning the linkage between unsafe linguistic or visual items and unsafe regions of the embedding space. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator. We conduct extensive experiments on the resulting embedding space for cross-modal retrieval, text-to-image, and image-to-text generation, where we show that our model can be remarkably employed with pre-trained generative models. Our source code and trained models are available at: https://github.com/aimagelab/safe-clip.

arxiv情報

著者	Samuele Poppi,Tobia Poppi,Federico Cocchi,Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara
発行日	2024-04-12 09:37:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー