LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

要約

VLM ベースのセーフガードモデルファミリである LlavaGuard を紹介します。これは、ビジュアルコンテンツの安全性コンプライアンスを評価するための多用途のフレームワークを提供します。
具体的には、データセットのアノテーションと生成モデルの保護のために LlavaGuard を設計しました。
この目的を達成するために、広範な安全分類を組み込んだ高品質の視覚データセットを収集し、注釈を付けました。これを使用して、コンテキストを認識した安全リスクに応じて VLM を調整します。
重要な革新として、LlavaGuard の新しい応答には、安全性評価、違反した安全性カテゴリー、詳細な根拠を含む包括的な情報が含まれています。
さらに、導入されたカスタマイズ可能な分類カテゴリにより、LlavaGuard をコンテキスト固有のさまざまなシナリオに合わせることができます。
私たちの実験では、複雑な現実世界のアプリケーションにおける LlavaGuard の機能が強調されています。
当社は、最先端のパフォーマンスを実証する 7B から 34B のパラメータ範囲のチェックポイントを提供しており、最小のモデルでも GPT-4 のようなベースラインを上回るパフォーマンスを示します。
私たちはデータセットとモデルの重みを公開し、コミュニティとコンテキストの多様なニーズに対応するためのさらなる研究を呼びかけています。

要約(オリジナル)

We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard’s new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.

arxiv情報

著者	Lukas Helff,Felix Friedrich,Manuel Brack,Kristian Kersting,Patrick Schramowski
発行日	2024-06-07 17:44:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー