RAZE: Region Guided Self-Supervised Gaze Representation Learning

要約

自動的な視線推定は、拡張現実感、バーチャルリアリティ、ヒューマンコンピュータインタラクションなどの様々な新しいトピックにおけるユースケースを持つ、ビジョンベースの支援技術における重要な問題である。過去数年にわたり、大規模なアノテーションデータの必要性を克服するために、教師なし学習や自己教師あり学習のパラダイムへの関心が高まっている。本論文では、非注釈付き顔画像データを活用した、領域ガイド付き自己教師付き視線表現学習フレームワークであるRAZEを提案する。RAZEは、擬似視線領域分類という補助的な監視を通して視線表現を学習する。この目的は、瞳孔中心の相対位置を利用し、視野を異なる視線領域（すなわち、左、右、中央）に分類することである。そこで、我々は154K枚のウェブ画像から擬似視線帯ラベルを自動的に付与し、「Ize-Net」フレームワークにより特徴表現を学習する。Ize-Netはカプセルレイヤーに基づくCNNアーキテクチャであり、豊富な視線表現を効率的に捕らえることができる。この特徴表現の識別性能は、4つのベンチマークデータセットで評価される。CAVE、TabletGaze、MPII、RT-GENEの4つのデータセットで評価した。さらに、提案ネットワークの一般性を他の2つの下流タスク（すなわち、ドライバーの視線推定と視覚的注意推定）で評価し、学習された視線表現の有効性を実証している。

要約(オリジナル)

Automatic eye gaze estimation is an important problem in vision based assistive technology with use cases in different emerging topics such as augmented reality, virtual reality and human-computer interaction. Over the past few years, there has been an increasing interest in unsupervised and self-supervised learning paradigms as it overcomes the requirement of large scale annotated data. In this paper, we propose RAZE, a Region guided self-supervised gAZE representation learning framework which leverage from non-annotated facial image data. RAZE learns gaze representation via auxiliary supervision i.e. pseudo-gaze zone classification where the objective is to classify visual field into different gaze zones (i.e. left, right and center) by leveraging the relative position of pupil-centers. Thus, we automatically annotate pseudo gaze zone labels of 154K web-crawled images and learn feature representations via `Ize-Net’ framework. `Ize-Net’ is a capsule layer based CNN architecture which can efficiently capture rich eye representation. The discriminative behaviour of the feature representation is evaluated on four benchmark datasets: CAVE, TabletGaze, MPII and RT-GENE. Additionally, we evaluate the generalizability of the proposed network on two other downstream task (i.e. driver gaze estimation and visual attention estimation) which demonstrate the effectiveness of the learnt eye gaze representation.

arxiv情報

著者	Neeru Dubey,Shreya Ghosh,Abhinav Dhall
発行日	2022-08-05 13:02:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

RAZE: Region Guided Self-Supervised Gaze Representation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー