Time to augment self-supervised visual representation learning

要約

生物学的視覚システムは、監督なしで視覚的表現を学習する能力において比類のないものです。
機械学習では、自己教師あり学習 (SSL) により、教師なしでオブジェクト表現を形成することが大幅に進歩しました。
このようなシステムは、トリミングや反転など、画像に対する拡張操作に対して不変な表現を学習します。
対照的に、生物学的視覚システムは、オブジェクトとの自然な相互作用中の視覚体験の時間的構造を利用します。
これにより、複数の視点から、または異なる背景に対して同じオブジェクトを監視するなど、SSL では一般的に使用されない「拡張機能」にアクセスできます。
ここでは、オブジェクトカテゴリを学習するための自然な相互作用中のこのような時間ベースの拡張の潜在的な利点を体系的に調査し、比較します。
私たちの結果は、時間ベースの増強が最先端の画像増強よりも大幅なパフォーマンスの向上を達成することを示しています。
具体的には、私たちの分析は次のことを明らかにしています。
2) 変化する背景に対してオブジェクトを表示することは、潜在的な表現から背景関連の情報を破棄することを学習するために重要です。
全体として、オブジェクトとの自然な相互作用中の時間ベースの増強は、自己教師あり学習を大幅に改善し、人工視覚システムと生物学的視覚システムの間のギャップを狭めることができると結論付けています。

要約(オリジナル)

Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to ‘augmentations’ not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object manipulations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is important for learning to discard background-related information from the latent representation. Overall, we conclude that time-based augmentations during natural interactions with objects can substantially improve self-supervised learning, narrowing the gap between artificial and biological vision systems.

arxiv情報

著者	Arthur Aubret,Markus Ernst,Céline Teulière,Jochen Triesch
発行日	2022-12-21 10:55:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Time to augment self-supervised visual representation learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー