Vision Transformer Adapters for Generalizable Multitask Learning

要約

新しいタスクやドメインに適用できる一般化可能なタスクアフィニティを学習する、最初のマルチタスクビジョントランスフォーマーアダプターを紹介します。
既製のビジョントランスフォーマバックボーンに統合された当社のアダプタは、パラメータ的に高価な既存のマルチタスクトランスとは異なり、パラメータ効率の高い方法で複数の高密度ビジョンタスクを同時に解決できます。
同時方式とは対照的に、新しいタスクやドメインが追加されるたびに再トレーニングや微調整を行う必要はありません。
アダプターフレームワーク内に、勾配ベースのタスクの類似性と注意ベースの類似性を組み合わせたタスクに適応した注意メカニズムを導入します。
学習されたタスクアフィニティは、ゼロショットタスク転送、教師なしドメイン適応、および新しいドメインへの微調整を行わない一般化の設定に一般化されます。
私たちのアプローチは、既存の畳み込みニューラルネットワークベースのマルチタスク手法だけでなく、ビジョントランスフォーマーベースの手法よりも優れていることを示します。
私たちのプロジェクトページは \url{https://ivrl.github.io/VTAGML} にあります。

要約(オリジナル)

We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network-based multitasking methods but also the vision transformer-based ones. Our project page is at \url{https://ivrl.github.io/VTAGML}.

arxiv情報

著者	Deblina Bhattacharjee,Sabine Süsstrunk,Mathieu Salzmann
発行日	2023-08-23 18:40:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision Transformer Adapters for Generalizable Multitask Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー