Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

要約

畳み込みネットワークとビジョントランスフォーマーには、層間でのプールやネットワークの末端でのプールなど、さまざまな形式のペアごとの相互作用があります。
後者は本当に違う必要があるのでしょうか？
プールの副産物として、ビジョントランスフォーマーは空間的注意を無料で提供しますが、これは十分に研究されていない自己監視しない限り、ほとんどの場合低品質です。
本当に監督が問題なのでしょうか？
この作業では、汎用プーリングフレームワークを開発し、次にいくつかの既存のメソッドをインスタンス化として定式化します。
メソッドの各グループのプロパティを議論することにより、畳み込みエンコーダーとトランスフォーマーエンコーダーの両方のデフォルトのメカニズムに代わる単純なアテンションベースのプーリングメカニズムである SimPool を導き出します。
教師ありか自己教師ありかに関係なく、これにより事前トレーニングおよび下流タスクのパフォーマンスが向上し、すべての場合においてオブジェクトの境界を示すアテンションマップが提供されることがわかりました。
したがって、SimPool はユニバーサルと呼ぶことができます。
私たちの知る限りでは、明示的な損失やアーキテクチャの変更を行わずに、少なくとも自己教師ありと同程度の品質の教師あり変圧器でアテンションマップを取得したのは私たちが初めてです。
コードは https://github.com/billpsomas/simpool にあります。

要約(オリジナル)

Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different? As a by-product of pooling, vision transformers provide spatial attention for free, but this is most often of low quality unless self-supervised, which is not well studied. Is supervision really the problem? In this work, we develop a generic pooling framework and then we formulate a number of existing methods as instantiations. By discussing the properties of each group of methods, we derive SimPool, a simple attention-based pooling mechanism as a replacement of the default one for both convolutional and transformer encoders. We find that, whether supervised or self-supervised, this improves performance on pre-training and downstream tasks and provides attention maps delineating object boundaries in all cases. One could thus call SimPool universal. To our knowledge, we are the first to obtain attention maps in supervised transformers of at least as good quality as self-supervised, without explicit losses or modifying the architecture. Code at: https://github.com/billpsomas/simpool.

arxiv情報

著者	Bill Psomas,Ioannis Kakogeorgiou,Konstantinos Karantzalos,Yannis Avrithis
発行日	2023-09-13 11:28:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー