FaceXBench: Evaluating Multimodal LLMs on Face Understanding

要約

マルチモーダル大規模言語モデル (MLLM) は、幅広いタスクやドメインにわたって優れた問題解決能力を示します。
しかし、彼らの顔を理解する能力は体系的に研究されていません。
このギャップに対処するために、複雑な顔認識タスクで MLLM を評価するように設計された包括的なベンチマークである FaceXBench を導入します。
FaceXBench には、25 の公開データセットと新しく作成されたデータセットである FaceXAPI から派生した 5,000 のマルチモーダル多肢選択質問が含まれています。
これらの質問は、6 つの広範なカテゴリにわたる 14 のタスクをカバーし、バイアスと公平性、顔認証、認識、分析、位置特定、およびツール検索における MLLM の顔理解能力を評価します。
FaceXBench を使用して、2 つの独自モデルと併せて 26 のオープンソース MLLM の広範な評価を実施し、複雑な顔認識タスクにおける特有の課題を明らかにしました。
私たちは、ゼロショット、コンテキスト内のタスクの説明、思考の連鎖のプロンプトという 3 つの評価設定にわたってモデルを分析します。
私たちの詳細な分析により、GPT-4o や GeminiPro 1.5 などの高度なモデルを含む現在の MLLM には、大きな改善の余地があることが明らかになりました。
私たちは、FaceXBench が高度な顔認識を実行する機能を備えた MLLM を開発するための重要なリソースになると信じています。
コード: https://github.com/Kartik-3004/facexbench

要約(オリジナル)

Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we introduce FaceXBench, a comprehensive benchmark designed to evaluate MLLMs on complex face understanding tasks. FaceXBench includes 5,000 multimodal multiple-choice questions derived from 25 public datasets and a newly created dataset, FaceXAPI. These questions cover 14 tasks across 6 broad categories, assessing MLLMs’ face understanding abilities in bias and fairness, face authentication, recognition, analysis, localization and tool retrieval. Using FaceXBench, we conduct an extensive evaluation of 26 open-source MLLMs alongside 2 proprietary models, revealing the unique challenges in complex face understanding tasks. We analyze the models across three evaluation settings: zero-shot, in-context task description, and chain-of-thought prompting. Our detailed analysis reveals that current MLLMs, including advanced models like GPT-4o, and GeminiPro 1.5, show significant room for improvement. We believe FaceXBench will be a crucial resource for developing MLLMs equipped to perform sophisticated face understanding. Code: https://github.com/Kartik-3004/facexbench

arxiv情報

著者	Kartik Narayan,Vibashan VS,Vishal M. Patel
発行日	2025-01-17 18:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FaceXBench: Evaluating Multimodal LLMs on Face Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー