CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

要約

自己教師あり学習 (SSL) は、大規模なニューラルネットワークの事前トレーニングにおける重要なアプローチとなっており、モデルとデータセットのサイズを前例のないほど拡張できるようになります。
I-JEPA のような最近の進歩により、ビジョントランスフォーマーでは有望な結果が得られましたが、このような手法を畳み込みニューラルネットワーク (CNN) に適応させるには特有の課題が生じます。
この論文では、共同埋め込み予測アーキテクチャアプローチを CNN に適用する新しい SSL 手法である CNN-JEPA を紹介します。
私たちの方法には、マスクされた入力を処理するためのスパース CNN エンコーダー、深さ方向に分離可能な畳み込みを使用する完全な畳み込み予測器、および改良されたマスキング戦略が組み込まれています。
CNN-JEPA が ImageNet-100 上の ViT アーキテクチャを使用した I-JEPA よりも優れたパフォーマンスを示し、標準 ResNet-50 エンコーダで 73.3% の線形トップ 1 精度を達成することを実証します。
他の CNN ベースの SSL メソッドと比較して、CNN-JEPA は同じエポック数に対して必要なトレーニング時間が 17 ～ 35% 少なく、BYOL、SimCLR、および VICReg の線形精度および k-NN トップ 1 精度に近づきます。
私たちのアプローチは、CNN 用の既存の SSL メソッドに代わる、よりシンプルで効率的な代替手段を提供し、必要な拡張は最小限で、別個のプロジェクターネットワークは必要ありません。

要約(オリジナル)

Self-supervised learning (SSL) has become an important approach in pretraining large neural networks, enabling unprecedented scaling of model and dataset sizes. While recent advances like I-JEPA have shown promising results for Vision Transformers, adapting such methods to Convolutional Neural Networks (CNNs) presents unique challenges. In this paper, we introduce CNN-JEPA, a novel SSL method that successfully applies the joint embedding predictive architecture approach to CNNs. Our method incorporates a sparse CNN encoder to handle masked inputs, a fully convolutional predictor using depthwise separable convolutions, and an improved masking strategy. We demonstrate that CNN-JEPA outperforms I-JEPA with ViT architectures on ImageNet-100, achieving 73.3% linear top-1 accuracy with a standard ResNet-50 encoder. Compared to other CNN-based SSL methods, CNN-JEPA requires 17-35% less training time for the same number of epochs and approaches the linear and k-NN top-1 accuracies of BYOL, SimCLR, and VICReg. Our approach offers a simpler, more efficient alternative to existing SSL methods for CNNs, requiring minimal augmentations and no separate projector network.

arxiv情報

著者	András Kalapos,Bálint Gyires-Tóth
発行日	2024-08-14 12:48:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー