Multi-modal Facial Affective Analysis based on Masked Autoencoder

要約

【タイトル】マスク付自己エンコーダーに基づくマルチモーダル顔情報分析

【要約】

– 人間の顔表情や行動を用いた感情分析は、人間の心理を理解するために重要である。
– ABAW競技会は、一般的に使用される感情表現の認識に向けた高品質かつ大規模なAff-wild2を提供するもので、感情の行動単位（AU）、基本的な表情カテゴリ（EXPR）、バランス・興奮度（VA）などが含まれる。
– 本論文では、CVPR 2023：ABAW5への参加を紹介する。アプローチには、以下のキー・コンポーネントが含まれている。
– まず、大規模な顔画像データセットに自己教師あり方式で事前学習されたマスク付自己エンコーダー（MAE）モデルから、ビジュアル情報を利用する。
– 次に、AU、EXPR、VAタスクのためのAff-wild2の画像フレームでMAEエンコーダーをファインチューニングし、静的で単一のモーダルなトレーニングになる。
– 加えて、動画からマルチ・モーダル情報と時間情報を活用し、トランスフォーマー・ベースのフレームワークを実装して、マルチ・モーダル特徴を統合する。
– 本アプローチは、ABAW5競技会で印象的な成果を収め、AUトラックでは平均F1スコアが55.49％、EXPRトラックでは41.21％、VAトラックでは平均CCCが0.6372で、EXPRとAUトラックでは1位、VAトラックでは2位を獲得した。
– 広範な定量的実験と消去実験により、提案された方法の有効性を実証した。

要約(オリジナル)

Human affective behavior analysis focuses on analyzing human expressions or other behaviors to enhance the understanding of human psychology. The CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW) is dedicated to providing high-quality and large-scale Aff-wild2 for the recognition of commonly used emotion representations, such as Action Units (AU), basic expression categories(EXPR), and Valence-Arousal (VA). The competition is committed to making significant strides in improving the accuracy and practicality of affective analysis research in real-world scenarios. In this paper, we introduce our submission to the CVPR 2023: ABAW5. Our approach involves several key components. First, we utilize the visual information from a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face image dataset in a self-supervised manner. Next, we finetune the MAE encoder on the image frames from the Aff-wild2 for AU, EXPR and VA tasks, which can be regarded as a static and uni-modal training. Additionally, we leverage the multi-modal and temporal information from the videos and implement a transformer-based framework to fuse the multi-modal features. Our approach achieves impressive results in the ABAW5 competition, with an average F1 score of 55.49\% and 41.21\% in the AU and EXPR tracks, respectively, and an average CCC of 0.6372 in the VA track. Our approach ranks first in the EXPR and AU tracks, and second in the VA track. Extensive quantitative experiments and ablation studies demonstrate the effectiveness of our proposed method.

arxiv情報

著者	Wei Zhang,Bowen Ma,Feng Qiu,Yu Ding
発行日	2023-04-11 06:41:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Multi-modal Facial Affective Analysis based on Masked Autoencoder

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー