Enhancing Few-shot Image Classification with Cosine Transformer

要約

この論文では、少量のラベル付きサポートサンプルのみが与えられた場合に、ラベルなしのクエリサンプルに対して分類タスクが実行される、少数ショットの画像分類問題に対処します。
少数ショット学習問題の大きな課題の 1 つは、オブジェクトの視覚的な外観が多種多様であるため、サポートサンプルがそのオブジェクトを包括的に表現できないことです。
これにより、サポートサンプルとクエリサンプルの間に大きな違いが生じ、少数ショットアルゴリズムのパフォーマンスが損なわれる可能性があります。
この論文では、サポートとクエリの間の関係マップが少数ショットタスクに対して効果的に取得される少数ショットコサイン変換器 (FS-CT) を提案することでこの問題に取り組みます。
FS-CT は、ハードケースを含むサポートサンプルからカテゴリ表現を取得するための学習可能なプロトタイプエンベディングネットワークと、2 つの異なるサポートサンプルとクエリサンプルからリレーショナルマップを効果的に取得するトランスエンコーダーの 2 つの部分で構成されます。
コサインアテンションは、より堅牢で安定したアテンションモジュールであり、トランスモジュールを大幅に強化するため、デフォルトのスケーリングされたドット積メカニズムと比較して、FS-CT パフォーマンスの精度が 5% から 20% 以上向上します。
私たちの方法は、ミニ ImageNet、CUB-200、および CIFAR-FS において、バックボーンおよび少数ショット構成にわたる 1 ショット学習タスクと 5 ショット学習タスクで競合する結果を実行します。
また、アルゴリズムの実用化の可能性を実証するために、ヨガのポーズ認識用のカスタムの少数ショットデータセットも開発しました。
コサインアテンションを備えた当社の FS-CT は、ヘルスケア、医療、セキュリティ監視などの幅広いアプリケーションに適用できる、軽量でシンプルな数ショットアルゴリズムです。
Few-shot Cosine Transformer の公式実装コードは、https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer で入手できます。

要約(オリジナル)

This paper addresses the few-shot image classification problem, where the classification task is performed on unlabeled query samples given a small amount of labeled support samples only. One major challenge of the few-shot learning problem is the large variety of object visual appearances that prevents the support samples to represent that object comprehensively. This might result in a significant difference between support and query samples, therefore undermining the performance of few-shot algorithms. In this paper, we tackle the problem by proposing Few-shot Cosine Transformer (FS-CT), where the relational map between supports and queries is effectively obtained for the few-shot tasks. The FS-CT consists of two parts, a learnable prototypical embedding network to obtain categorical representations from support samples with hard cases, and a transformer encoder to effectively achieve the relational map from two different support and query samples. We introduce Cosine Attention, a more robust and stable attention module that enhances the transformer module significantly and therefore improves FS-CT performance from 5% to over 20% in accuracy compared to the default scaled dot-product mechanism. Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks across backbones and few-shot configurations. We also developed a custom few-shot dataset for Yoga pose recognition to demonstrate the potential of our algorithm for practical application. Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications, such as healthcare, medical, and security surveillance. The official implementation code of our Few-shot Cosine Transformer is available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer

arxiv情報

著者	Quang-Huy Nguyen,Cuong Q. Nguyen,Dung D. Le,Hieu H. Pham
発行日	2023-07-21 16:54:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Few-shot Image Classification with Cosine Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー