Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding

要約

対照学習は、ラベルのないデータを利用して堅牢な表現を学習できる有望な可能性を示しています。
ただし、顔の動作データセットに対する対比学習のための効果的なポジティブとネガティブのペアを構築することは依然として困難です。
これは、そのようなペアは必然的に被験者 ID 情報をエンコードし、顔の行動データセット内の被験者の数が限られているため、ランダムに構築されたペアが類似の顔画像を押しのけてしまう可能性があるためです。
この問題に対処するために、一部のデータセットで提供されるアクティビティ記述、つまり画像シーケンスに関する高レベルの意味情報を提供できる粗粒情報を利用することを提案しますが、以前の研究では無視されることがよくありました。
より具体的には、顔の動作理解 (CLEF) のためのテキスト埋め込みフレームワークによる 2 段階の対照学習を導入します。
最初の段階は、粗粒度のアクティビティ情報を使用して構築された正負のペアから表現を学習する、弱教師付き対比学習方法です。
第 2 段階では、画像と対応するテキストラベル名の類似性を最大化することで、顔の表情や顔の動作単位の認識をトレーニングすることを目的としています。
提案された CLEF は、AU 認識用の 3 つのラボ内データセットと、顔の表情認識用の 3 つの実際のデータセットで最先端のパフォーマンスを達成します。

要約(オリジナル)

Contrastive learning has shown promising potential for learning robust representations by utilizing unlabeled data. However, constructing effective positive-negative pairs for contrastive learning on facial behavior datasets remains challenging. This is because such pairs inevitably encode the subject-ID information, and the randomly constructed pairs may push similar facial images away due to the limited number of subjects in facial behavior datasets. To address this issue, we propose to utilize activity descriptions, coarse-grained information provided in some datasets, which can provide high-level semantic information about the image sequences but is often neglected in previous studies. More specifically, we introduce a two-stage Contrastive Learning with Text-Embeded framework for Facial behavior understanding (CLEF). The first stage is a weakly-supervised contrastive learning method that learns representations from positive-negative pairs constructed using coarse-grained activity information. The second stage aims to train the recognition of facial expressions or facial action units by maximizing the similarity between image and the corresponding text label names. The proposed CLEF achieves state-of-the-art performance on three in-the-lab datasets for AU recognition and three in-the-wild datasets for facial expression recognition.

arxiv情報

著者	Xiang Zhang,Taoyue Wang,Xiaotian Li,Huiyuan Yang,Lijun Yin
発行日	2023-08-25 14:31:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー