Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

要約

数式駆動型教師あり学習 (FDSL) は、ビジョントランスフォーマーの事前トレーニングに効果的な方法であることが示されています。ExFractalDB-21k は、ImageNet-21k の事前トレーニング効果を超えることが示されています。
これらの研究はまた、ビジョントランスフォーマーを事前トレーニングする際に、輪郭がテクスチャよりも重要であることを示しています。
ただし、これらの輪郭指向の合成データセットが実際のデータセットと同じ精度を達成できる理由についての体系的な調査の欠如は、懐疑的な余地を残しています。
現在の作業では、輪郭指向の合成データセットの設計空間を体系的に調査するための円高調波に基づく新しい方法論を開発します。
これにより、FDSL パラメーターの最適な範囲を効率的に検索し、データセット内のさまざまな合成画像を最大化することができます。これは重要な要素であることがわかりました。
結果として得られる新しいデータセット VisualAtom-21k を ViT-Base の事前トレーニングに使用すると、ImageNet-1k で微調整すると、トップ 1 の精度は 83.7% に達しました。
これは、JFT-300M 事前トレーニングで達成されたトップ 1 の精度 (84.2%) に近く、画像の数は 1/14 です。
静的データセットである JFT-300M とは異なり、合成データセットの品質は向上し続けており、現在の作業はこの可能性を証明しています。
FDSL には、実際の画像に関連する一般的な問題もありません。
プライバシー/著作権の問題、ラベル付けのコスト/エラー、および倫理的偏見。

要約(オリジナル)

Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthetic datasets can achieve the same accuracy as real datasets leaves much room for skepticism. In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. This allows us to efficiently search the optimal range of FDSL parameters and maximize the variety of synthetic images in the dataset, which we found to be a critical factor. When the resulting new dataset VisualAtom-21k is used for pre-training ViT-Base, the top-1 accuracy reached 83.7% when fine-tuning on ImageNet-1k. This is close to the top-1 accuracy (84.2%) achieved by JFT-300M pre-training, while the number of images is 1/14. Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility. FDSL is also free of the common issues associated with real images, e.g. privacy/copyright issues, labeling costs/errors, and ethical biases.

arxiv情報

著者	Sora Takashima,Ryo Hayamizu,Nakamasa Inoue,Hirokatsu Kataoka,Rio Yokota
発行日	2023-03-02 09:47:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー