Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

要約

幼児は、言語入力の獲得に先立って、複雑な視覚的理解を急速に発達させます。
コンピュータービジョンは人間の視覚システムを再現しようとしているため、幼児の視覚発達を理解することは貴重な洞察を提供する可能性があります。
この論文では、この問題を探求する学際的な研究を紹介します。幼児の学習プロセスを模倣する計算モデルは、幼児が自然に学習する方法と同様に、聞いた語彙を超えて拡張するより広範な視覚概念を開発できるでしょうか?
これを調査するために、Vong らによって最近 Science 誌に発表されたモデルを分析します。このモデルは、転写された親の音声と対になった、一人の子供の縦方向の自己中心的な画像でトレーニングされています。
モデルの内部表現に隠された視覚概念ニューロンを発見できる、トレーニング不要のフレームワークを紹介します。
私たちの研究結果は、これらのニューロンが元の語彙の外にあるオブジェクトを分類できることを示しています。
さらに、幼児のようなモデルの視覚表現を、CLIP や ImageNet の事前トレーニング済みモデルなどの最新のコンピュータービジョンモデルの視覚表現と比較し、重要な類似点と相違点を強調します。
最終的に、私たちの研究は、幼児の視覚および言語入力に基づいて訓練された計算モデルの内部表現を分析することによって、認知科学とコンピュータービジョンの橋渡しをします。

要約(オリジナル)

Infants develop complex visual understanding rapidly, even preceding of the acquisition of linguistic inputs. As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights. In this paper, we present an interdisciplinary study exploring this question: can a computational model that imitates the infant learning process develop broader visual concepts that extend beyond the vocabulary it has heard, similar to how infants naturally learn? To investigate this, we analyze a recently published model in Science by Vong et al.,which is trained on longitudinal, egocentric images of a single child paired with transcribed parental speech. We introduce a training-free framework that can discover visual concept neurons hidden in the model’s internal representations. Our findings show that these neurons can classify objects outside its original vocabulary. Furthermore, we compare the visual representations in infant-like models with those in moder computer vision models, such as CLIP or ImageNet pre-trained model, highlighting key similarities and differences. Ultimately, our work bridges cognitive science and computer vision by analyzing the internal representations of a computational model trained on an infant’s visual and linguistic inputs.

arxiv情報

著者	Xueyi Ke,Satoshi Tsutsui,Yayun Zhang,Bihan Wen
発行日	2025-01-09 12:55:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー