Human Gaze Boosts Object-Centered Representation Learning

要約

人間のような自己中心的な視覚入力でトレーニングされた最近の自己教師あり学習 (SSL) モデルは、人間と比較して画像認識タスクのパフォーマンスが大幅に劣ります。
これらのモデルは、ヘッドマウントカメラから収集された生の均一な視覚入力に基づいてトレーニングされます。
これは人間とは異なります。網膜と視覚野の解剖学的構造により、中心視覚情報、つまり人間の視線位置の周囲が相対的に増幅されます。
人間におけるこの選択的な増幅は、オブジェクト中心の視覚表現の形成に役立つと考えられます。
ここでは、中心的な視覚情報に焦点を当てることが自己中心的な視覚オブジェクトの学習を促進するかどうかを調査します。
大規模な Ego4D データセットを使用して 5 か月間にわたる自己中心的な視覚体験をシミュレートし、人間の視線予測モデルを使用して視線位置を生成します。
人間における中心視覚の重要性を説明するために、注視位置の周囲の視覚領域をトリミングします。
最後に、これらの変更された入力で時間ベースの SSL モデルをトレーニングします。
私たちの実験は、中心視覚に焦点を当てると、オブジェクト中心の表現が向上することを示しています。
私たちの分析では、SSL モデルが視線の動きの時間的ダイナミクスを活用して、より強力な視覚表現を構築していることがわかりました。
全体として、私たちの研究は、生物学にインスピレーションを得た視覚表現の学習に向けた重要な一歩を示しています。

要約(オリジナル)

Recent self-supervised learning (SSL) models trained on human-like egocentric visual inputs substantially underperform on image recognition tasks compared to humans. These models train on raw, uniform visual inputs collected from head-mounted cameras. This is different from humans, as the anatomical structure of the retina and visual cortex relatively amplifies the central visual information, i.e. around humans’ gaze location. This selective amplification in humans likely aids in forming object-centered visual representations. Here, we investigate whether focusing on central visual information boosts egocentric visual object learning. We simulate 5-months of egocentric visual experience using the large-scale Ego4D dataset and generate gaze locations with a human gaze prediction model. To account for the importance of central vision in humans, we crop the visual area around the gaze location. Finally, we train a time-based SSL model on these modified inputs. Our experiments demonstrate that focusing on central vision leads to better object-centered representations. Our analysis shows that the SSL model leverages the temporal dynamics of the gaze movements to build stronger visual representations. Overall, our work marks a significant step toward bio-inspired learning of visual representations.

arxiv情報

著者	Timothy Schaumlöffel,Arthur Aubret,Gemma Roig,Jochen Triesch
発行日	2025-01-06 12:21:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Human Gaze Boosts Object-Centered Representation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー