A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition

要約

畳み込みニューラルネットワーク（CNNS）とその変動により、表情表現認識（FER）の有効性が示されています。
ただし、現実世界のシナリオで高い計算の複雑さとマルチビューヘッドのポーズを扱う際に、彼らは課題に直面しています。
これらの問題に取り組むために、マルチスケール機能融合（LANMSFF）を組み込んだ軽量の注意ネットワークを導入します。
最初の課題では、軽量ネットワークを慎重に設計します。
2つの新しいコンポーネント、すなわちマスの注意（MassATT）とPoint Wise Feature Selection（PWFS）ブロックを提示することにより、2番目の課題に対処します。
Massattブロックは、重要な機能を強調しながら無関係な機能を抑制することにより、機能マップを再調整するためのチャネルと空間の注意マップを同時に生成します。
さらに、PWFSブロックは、融合プロセスの前にあまり意味のない機能を破棄する機能選択メカニズムを採用しています。
このメカニズムは、マルチスケール機能を直接融合する以前の方法と区別します。
提案されたアプローチは、パラメーターカウントと堅牢性の点で最先端の方法に匹敵する結果を達成しました。KDEFで90.77％、2013年FERで70.44％、FERPLUSデータセットで86.96％を記録しました。
LANMSFFのコードは、https：//github.com/ae-1129/lanmsffで入手できます。

要約(オリジナル)

Convolutional neural networks (CNNs) and their variations have shown effectiveness in facial expression recognition (FER). However, they face challenges when dealing with high computational complexity and multi-view head poses in real-world scenarios. We introduce a lightweight attentional network incorporating multi-scale feature fusion (LANMSFF) to tackle these issues. For the first challenge, we carefully design a lightweight network. We address the second challenge by presenting two novel components, namely mass attention (MassAtt) and point wise feature selection (PWFS) blocks. The MassAtt block simultaneously generates channel and spatial attention maps to recalibrate feature maps by emphasizing important features while suppressing irrelevant ones. In addition, the PWFS block employs a feature selection mechanism that discards less meaningful features prior to the fusion process. This mechanism distinguishes it from previous methods that directly fuse multi-scale features. Our proposed approach achieved results comparable to state-of-the-art methods in terms of parameter count and robustness to pose variation, with accuracy rates of 90.77% on KDEF, 70.44% on FER-2013, and 86.96% on FERPlus datasets. The code for LANMSFF is available at https://github.com/AE-1129/LANMSFF.

arxiv情報

著者	Ali Ezati,Mohammadreza Dezyani,Rajib Rana,Roozbeh Rajabi,Ahmad Ayatollahi
発行日	2025-02-10 17:57:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー