ELITE: Enhanced Language-Image Toxicity Evaluation for Safety

要約

現在のビジョン言語モデル（VLM）は、有害な出力を誘導する悪意のあるプロンプトに対して脆弱なままです。
VLMの既存の安全ベンチマークは主に自動化された評価方法に依存していますが、これらの方法は暗黙の有害なコンテンツを検出したり、不正確な評価を生成するのに苦労しています。
したがって、既存のベンチマークには、有害レベルが低く、あいまいなデータ、および画像テキストペアの組み合わせにおける多様性が限られていることがわかりました。
これらの問題に対処するために、VLMSの高品質の安全評価ベンチマークであるElite Benchmarkを提案します。
エリート評価者は、マルチモーダルのコンテキストでの有害性を正確に評価するために毒性スコアを明示的に組み込みます。ここでは、VLMは多くの場合、特定の説得力のある、しかし無駄のない画像の説明を提供します。
エリート評価者を使用して、既存のベンチマークから曖昧で低品質の画像テキストペアを除外し、安全で安全でない画像テキストペアの多様な組み合わせを生成します。
私たちの実験は、エリート評価者が以前の自動化された方法と比較して人間の評価と優れた整合性を達成することを示しており、エリートベンチマークはベンチマークの品質と多様性の向上を提供することを示しています。
エリートを紹介することで、より安全で堅牢なVLMSへの道を開き、実際のアプリケーションで安全リスクを評価および緩和するための重要なツールを提供します。

要約(オリジナル)

Current Vision Language Models (VLMs) remain vulnerable to malicious prompts that induce harmful outputs. Existing safety benchmarks for VLMs primarily rely on automated evaluation methods, but these methods struggle to detect implicit harmful content or produce inaccurate evaluations. Therefore, we found that existing benchmarks have low levels of harmfulness, ambiguous data, and limited diversity in image-text pair combinations. To address these issues, we propose the ELITE benchmark, a high-quality safety evaluation benchmark for VLMs, underpinned by our enhanced evaluation method, the ELITE evaluator. The ELITE evaluator explicitly incorporates a toxicity score to accurately assess harmfulness in multimodal contexts, where VLMs often provide specific, convincing, but unharmful descriptions of images. We filter out ambiguous and low-quality image-text pairs from existing benchmarks using the ELITE evaluator and generate diverse combinations of safe and unsafe image-text pairs. Our experiments demonstrate that the ELITE evaluator achieves superior alignment with human evaluations compared to prior automated methods, and the ELITE benchmark offers enhanced benchmark quality and diversity. By introducing ELITE, we pave the way for safer, more robust VLMs, contributing essential tools for evaluating and mitigating safety risks in real-world applications.

arxiv情報

著者	Wonjun Lee,Doehyeon Lee,Eugene Choi,Sangyoon Yu,Ashkan Yousefpour,Haon Park,Bumsub Ham,Suhyun Kim
発行日	2025-02-10 04:39:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ELITE: Enhanced Language-Image Toxicity Evaluation for Safety

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー