LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation

要約

視線推定モデルの一般化能力は、特にトレーニングデータセットが限られている場合、視線に関係のないさまざまな要因によって大きく妨げられることがよくあります。
現在の戦略は、さまざまなドメイン一般化手法を通じてこの課題に対処することを目的としていますが、回帰の値ラベルのみに依存する場合は過剰適合のリスクがあるため、成功は限られています。
事前トレーニングされた視覚言語モデルの最近の進歩により、利用可能な豊富な意味情報を活用するようになりました。
本稿では、視線推定タスクを視覚と言語の整合の問題として再構成する、新しいアプローチを提案します。
私たちが提案するフレームワークは、Language-Guided Gaze Estimation (LG-Gaze) と名付けられ、視覚言語モデルの豊富な事前知識を活用して、視線推定のための連続的で幾何学的な依存性のある特徴を学習します。
具体的には、LG-Gaze は、さまざまなネガティブサンプルの適応重みをカスタマイズする、私たちが提案するマルチモーダル対比回帰損失を通じて、視線の特徴を連続的な言語の特徴と一致させます。
さらに、視線推定タスクのラベルによりよく適応するために、より正確な視線埋め込みを取得するためのジオメトリを意識した補間方法を提案します。
広範な実験を通じて、4 つの異なるクロスドメイン評価タスクにおけるフレームワークの有効性を検証しました。

要約(オリジナル)

The ability of gaze estimation models to generalize is often significantly hindered by various factors unrelated to gaze, especially when the training dataset is limited. Current strategies aim to address this challenge through different domain generalization techniques, yet they have had limited success due to the risk of overfitting when solely relying on value labels for regression. Recent progress in pre-trained vision-language models has motivated us to capitalize on the abundant semantic information available. We propose a novel approach in this paper, reframing the gaze estimation task as a vision-language alignment issue. Our proposed framework, named Language-Guided Gaze Estimation (LG-Gaze), learns continuous and geometry-sensitive features for gaze estimation benefit from the rich prior knowledges of vision-language models. Specifically, LG-Gaze aligns gaze features with continuous linguistic features through our proposed multimodal contrastive regression loss, which customizes adaptive weights for different negative samples. Furthermore, to better adapt to the labels for gaze estimation task, we propose a geometry-aware interpolation method to obtain more precise gaze embeddings. Through extensive experiments, we validate the efficacy of our framework in four different cross-domain evaluation tasks.

arxiv情報

著者	Pengwei Yin,Jingjing Wang,Guanzhong Zeng,Di Xie,Jiang Zhu
発行日	2024-11-13 13:46:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー