Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

要約

マルチラベル画像分類では、特定の画像から一連のラベルを予測できます。
画像ごとに 1 つのラベルのみが割り当てられるマルチクラス分類とは異なり、このような設定はより幅広いアプリケーションに適用できます。
この作業では、マルチラベル分類への 2 つの一般的なアプローチを再検討します。トランスフォーマーベースのヘッドとラベル関係情報グラフ処理ブランチです。
トランスフォーマーベースのヘッドは、グラフベースの分岐よりも優れた結果を達成すると考えられていますが、適切なトレーニング戦略を使用すると、グラフベースの方法は、推論に費やす計算リソースを抑えながら、精度の低下をわずかに抑えることができると主張します。
私たちのトレーニング戦略では、マルチラベル分類の事実上の標準である非対称損失 (ASL) の代わりに、メトリック学習の変更を導入します。
各バイナリ分類サブ問題では、バックボーンからの $L_2$ 正規化された特徴ベクトルで動作し、正サンプルと負サンプルの正規化された表現の間の角度ができるだけ大きくなるように強制します。
これにより、バイナリクロスエントロピー損失が正規化されていない特徴に対して行うよりも優れた識別能力が得られます。
提案された損失とトレーニング戦略を使用して、MS-COCO、PASCAL-VOC、NUS-Wide、Visual Genome 500 などの広範なマルチラベル分類ベンチマークで単一モダリティメソッド間で SOTA 結果を取得します。メソッドのソースコードは、の一部として入手できます。
OpenVINO トレーニング拡張機能 https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

要約(オリジナル)

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy, graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its metric learning modification. In each binary classification sub-problem it operates with $L_2$ normalized feature vectors coming from a backbone and enforces angles between the normalized representations of positive and negative samples to be as large as possible. This results in providing a better discrimination ability, than binary cross entropy loss does on unnormalized features. With the proposed loss and training strategy, we obtain SOTA results among single modality methods on widespread multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. Source code of our method is available as a part of the OpenVINO Training Extensions https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

arxiv情報

著者	Kirill Prokofiev,Vladislav Sovrasov
発行日	2022-12-20 10:00:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー