Beyond Random Augmentations: Pretraining with Hard Views

要約

自己教師の学習（SSL）メソッドは、通常、ランダムな画像の増強またはビューに依存して、モデルを異なる変換に不変にします。
従来のランダムビューサンプリングに基づいて、学習の進捗に役立つビューを明示的に選択することにより、パイプラインを事前に削除する有効性を強化できると仮定します。
シンプルでありながら効果的なアプローチは、より高い損失をもたらすハードビューを選択することです。
このホワイトペーパーでは、Hard View Pretraining（HVP）を提案します。これは、SSLプレイング中にモデルをより挑戦的なサンプルにさらすことにより、ランダムビューの生成を拡張する学習フリー戦略です。
HVPには、次の反復手順が含まれます。1）複数のビューをランダムにサンプリングし、事前に守られたモデルを介して各ビューを転送する、2）2つのビューのペアを作成して損失を計算する、3）現在のモデル状態に応じて最高の損失を生成するペアを敵対的に選択する
、および4）選択したペアでバックワードパスを実行します。
既存のハードビューの文献とは対照的に、私たちは、特に完全なImagenet-1Kデータセットでのトレーニングを行い、複数のSSLメソッド、コンボネット、およびVITで評価するハードビューPretrainingの有効性、特に完全なImagenet-1Kデータセットでのトレーニングを実証する最初の人です。
その結果、HVPはDino VIT-B/16に新しい最先端を設定し、78.8％の線形評価精度（0.6％改善）に達し、100と300のエポック前削除で1％の一貫したゲインに達し、
Dino、Simsiam、Ibot、およびSimclrの転送タスク全体の同様の改善。

要約(オリジナル)

Self-Supervised Learning (SSL) methods typically rely on random image augmentations, or views, to make models invariant to different transformations. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple yet effective approach is to select hard views that yield a higher loss. In this paper, we propose Hard View Pretraining (HVP), a learning-free strategy that extends random view generation by exposing models to more challenging samples during SSL pretraining. HVP encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss according to the current model state, and 4) perform a backward pass with the selected pair. In contrast to existing hard view literature, we are the first to demonstrate hard view pretraining’s effectiveness at scale, particularly training on the full ImageNet-1k dataset, and evaluating across multiple SSL methods, ConvNets, and ViTs. As a result, HVP sets a new state-of-the-art on DINO ViT-B/16, reaching 78.8% linear evaluation accuracy (a 0.6% improvement) and consistent gains of 1% for both 100 and 300 epoch pretraining, with similar improvements across transfer tasks in DINO, SimSiam, iBOT, and SimCLR.

arxiv情報

著者	Fabio Ferreira,Ivo Rapant,Jörg K. H. Franke,Frank Hutter
発行日	2025-02-06 12:39:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Beyond Random Augmentations: Pretraining with Hard Views

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー