PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images

要約

RGB 画像の自己教師あり学習 (SSL) は大きな成功を収めていますが、主に次の 3 つの顕著な課題により、赤外線画像の SSL に関する研究はまだ限られています。1) 適切な大規模赤外線事前トレーニングデータセットの欠如。
2) 非象徴的な赤外線画像の特徴により、マスク画像モデリング (MIM) などの一般的な事前トレーニングタスクの効果が低下します。3) きめの細かいテクスチャが不足しているため、一般的な画像の特徴を学習することが特に困難です。
これらの問題に対処するために、178,756 枚の画像で構成されるマルチシーン赤外線事前トレーニング (MSIP) データセットを構築し、非象徴的な画像によってもたらされる課題に取り組むために、画像の前処理方法であるオブジェクトに応じたランダム RoI クロッピングを導入します。
特徴学習に対する弱いテクスチャの影響を軽減するために、Pre-training with ADapter (PAD) と呼ばれる事前トレーニングパラダイムを提案します。これは、一般的な特徴を保持するために ImageNet で事前トレーニングされたパラメーターをフリーズしながら、アダプターを使用してドメイン固有の特徴を学習します。
抽出能力。
この新しいパラダイムは、トランスフォーマーベースの SSL メソッドに適用できます。
さらに、異なるレイヤーやパッチにおける事前トレーニングされた機能と新たに学習された機能の間のより柔軟な調整を実現するために、動的に学習可能なスケール係数を備えたパッチごとのスケールアダプターが導入されています。
3 つのダウンストリームタスクに関する広範な実験により、事前トレーニング可能なパラメータがわずか 123 万個の PAD が、MSIP での継続的な完全な事前トレーニングを含む他のベースラインパラダイムよりも優れたパフォーマンスを発揮することが示されました。
コードとデータセットは https://github.com/casiatao/PAD で入手できます。

要約(オリジナル)

Self-supervised learning (SSL) for RGB images has achieved significant success, yet there is still limited research on SSL for infrared images, primarily due to three prominent challenges: 1) the lack of a suitable large-scale infrared pre-training dataset, 2) the distinctiveness of non-iconic infrared images rendering common pre-training tasks like masked image modeling (MIM) less effective, and 3) the scarcity of fine-grained textures making it particularly challenging to learn general image features. To address these issues, we construct a Multi-Scene Infrared Pre-training (MSIP) dataset comprising 178,756 images, and introduce object-sensitive random RoI cropping, an image preprocessing method, to tackle the challenge posed by non-iconic images. To alleviate the impact of weak textures on feature learning, we propose a pre-training paradigm called Pre-training with ADapter (PAD), which uses adapters to learn domain-specific features while freezing parameters pre-trained on ImageNet to retain the general feature extraction capability. This new paradigm is applicable to any transformer-based SSL method. Furthermore, to achieve more flexible coordination between pre-trained and newly-learned features in different layers and patches, a patchwise-scale adapter with dynamically learnable scale factors is introduced. Extensive experiments on three downstream tasks show that PAD, with only 1.23M pre-trainable parameters, outperforms other baseline paradigms including continual full pre-training on MSIP. Our code and dataset are available at https://github.com/casiatao/PAD.

arxiv情報

著者	Tao Zhang,Kun Ding,Jinyong Wen,Yu Xiong,Zeyu Zhang,Shiming Xiang,Chunhong Pan
発行日	2023-12-13 14:57:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー