FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

要約

顔の感情分析（FEA）は、顔のデータに基づいて人の感情状態を推測することを目指して、視覚的な感情コンピューティングにおいて重要な役割を果たします。
科学的には、表情（FES）は、顔の筋肉の協調的な動きに起因し、詳細な感情的洞察を提供する特定のアクションユニット（AUS）に分解できます。
しかし、従来の方法は、限られた解釈可能性、制約された一般化、推論能力に苦労することがよくあります。
最近、マルチモーダル大手言語モデル（MLLM）は、さまざまな視覚タスクで並外れたパフォーマンスを示していますが、FEAの重要な課題には、特殊なデータセットがないため、FESとAUSの複雑な関係を把握できないためです。
これらの問題に対処するために、正確で整列したFEおよびAUの説明を提供し、それらの間の因果的推論関係を確立する新しいFEA命令データセットを導入し、新しいベンチマークFeabenchを構築します。
さらに、より詳細な顔の情報をキャプチャするように設計された新しいMLLMアーキテクチャであるFeallmを提案し、FEAタスクでの能力を高めます。
私たちのモデルは、FEAタスクにおける堅牢性と有効性を紹介する、RAF-DB、EbhentNet、BP4D、DISFAなど、さまざまなデータセットでゼロショット評価を通じて、Feabenchの強力なパフォーマンスと印象的な一般化能力を示しています。
データセットとコードはhttps://github.com/953206211/feallmで入手できます。

要約(オリジナル)

Facial Emotion Analysis (FEA) plays a crucial role in visual affective computing, aiming to infer a person’s emotional state based on facial data. Scientifically, facial expressions (FEs) result from the coordinated movement of facial muscles, which can be decomposed into specific action units (AUs) that provide detailed emotional insights. However, traditional methods often struggle with limited interpretability, constrained generalization and reasoning abilities. Recently, Multimodal Large Language Models (MLLMs) have shown exceptional performance in various visual tasks, while they still face significant challenges in FEA due to the lack of specialized datasets and their inability to capture the intricate relationships between FEs and AUs. To address these issues, we introduce a novel FEA Instruction Dataset that provides accurate and aligned FE and AU descriptions and establishes causal reasoning relationships between them, followed by constructing a new benchmark, FEABench. Moreover, we propose FEALLM, a novel MLLM architecture designed to capture more detailed facial information, enhancing its capability in FEA tasks. Our model demonstrates strong performance on FEABench and impressive generalization capability through zero-shot evaluation on various datasets, including RAF-DB, AffectNet, BP4D, and DISFA, showcasing its robustness and effectiveness in FEA tasks. The dataset and code will be available at https://github.com/953206211/FEALLM.

arxiv情報

著者	Zhuozhao Hu,Kaishen Yuan,Xin Liu,Zitong Yu,Yuan Zong,Jingang Shi,Huanjing Yue,Jingyu Yang
発行日	2025-05-19 17:52:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー