MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution

要約

顔面微小表現認識（MER）は、一時的で微妙な微小標識（ME）アクションのため、挑戦的な問題です。
ほとんどの既存の方法は、手作りの機能、開始、頂点、オフセットフレームなどのキーフレーム、または小規模および低双方のデータセットによって制限されているディープネットワークに依存します。
この論文では、トランス、グラフの畳み込み、バニラの畳み込みからの利点を備えたエンドツーエンドのマイクロアクションを意識するディープラーニングフレームワークを提案します。
特に、キーフレームの事前知識なしに、一連の生フレームからローカルグローバルの特徴を直接抽出するために、完全に接続された畳み込みとチャネル対応畳み込みで構成される新しいF5Cブロックを提案します。
変圧器スタイルの完全に接続された畳み込みは、グローバルな受容フィールドを維持しながらローカル機能を抽出するために提案されており、グラフスタイルのチャネル対応畳み込みが導入され、特徴パターン間の相関をモデル化します。
さらに、MER、光学フロー推定、および顔面ランドマークの検出は、ローカルグローバル機能を共有することにより、共同でトレーニングされます。
後者の2つのタスクは、MERの顔の微妙なアクション情報のキャプチャに貢献しており、トレーニングデータが不十分な影響を軽減できます。
広範な実験は、（i）CASME II、SAMM、およびSMICベンチマークの最先端のMERメソッドを上回ることを実証しています。
このコードは、https：//github.com/cyf-cuber/molで入手できます。

要約(オリジナル)

Facial micro-expression recognition (MER) is a challenging problem, due to transient and subtle micro-expression (ME) actions. Most existing methods depend on hand-crafted features, key frames like onset, apex, and offset frames, or deep networks limited by small-scale and low-diversity datasets. In this paper, we propose an end-to-end micro-action-aware deep learning framework with advantages from transformer, graph convolution, and vanilla convolution. In particular, we propose a novel F5C block composed of fully-connected convolution and channel correspondence convolution to directly extract local-global features from a sequence of raw frames, without the prior knowledge of key frames. The transformer-style fully-connected convolution is proposed to extract local features while maintaining global receptive fields, and the graph-style channel correspondence convolution is introduced to model the correlations among feature patterns. Moreover, MER, optical flow estimation, and facial landmark detection are jointly trained by sharing the local-global features. The two latter tasks contribute to capturing facial subtle action information for MER, which can alleviate the impact of insufficient training data. Extensive experiments demonstrate that our framework (i) outperforms the state-of-the-art MER methods on CASME II, SAMM, and SMIC benchmarks, (ii) works well for optical flow estimation and facial landmark detection, and (iii) can capture facial subtle muscle actions in local regions associated with MEs. The code is available at https://github.com/CYF-cuber/MOL.

arxiv情報

著者	Zhiwen Shao,Yifan Cheng,Feiran Li,Yong Zhou,Xuequan Lu,Yuan Xie,Lizhuang Ma
発行日	2025-06-17 13:35:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MOL: Joint Estimation of Micro-Expression, Optical Flow, and Landmark via Transformer-Graph-Style Convolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー