Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

要約

Vision Foundation Model (VFM) は、多数の下流タスクで優れたパフォーマンスを実証しています。
ただし、さまざまなトレーニングパラダイムに由来する固有の表現バイアスにより、VFM は異なる視覚タスク全体にわたって利点と欠点を示します。
下流タスク用に複数の VFM の長所を融合することは直感的な戦略ですが、これらのバイアスを効果的に活用することは依然として大きな課題です。
この論文では、VFM の委員会から知識を適応的に抽出してマルチタスク学習を強化する、斬新で汎用性の高い「スイスアーミーナイフ」(SAK) ソリューションを提案します。
知識伝達に単一のバックボーンを使用する既存の方法とは異なり、私たちのアプローチは、軽量の教師固有アダプターパスモジュールと教師非依存ステムを連携させることにより、各教師の固有の表現バイアスを保存します。
Mixture-of-Representations Router による表現の動的な選択と組み合わせを通じて、当社の SAK は複数の VFM の補完的な長所を相乗作用することができます。
広範な実験により、当社の SAK は、マルチタスク学習における従来の最先端技術を NYUD-v2 ベンチマークで 10% 大幅に上回ると同時に、より高度なモデル設計に容易に対応できる柔軟で堅牢なフレームワークを提供することが示されています。

要約(オリジナル)

Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks. However, due to their inherent representation biases originating from different training paradigms, VFMs exhibit advantages and disadvantages across distinct vision tasks. Although amalgamating the strengths of multiple VFMs for downstream tasks is an intuitive strategy, effectively exploiting these biases remains a significant challenge. In this paper, we propose a novel and versatile ‘Swiss Army Knife’ (SAK) solution, which adaptively distills knowledge from a committee of VFMs to enhance multi-task learning. Unlike existing methods that use a single backbone for knowledge transfer, our approach preserves the unique representation bias of each teacher by collaborating the lightweight Teacher-Specific Adapter Path modules with the Teacher-Agnostic Stem. Through dynamic selection and combination of representations with Mixture-of-Representations Routers, our SAK is capable of synergizing the complementary strengths of multiple VFMs. Extensive experiments show that our SAK remarkably outperforms prior state of the arts in multi-task learning by 10% on the NYUD-v2 benchmark, while also providing a flexible and robust framework that can readily accommodate more advanced model designs.

arxiv情報

著者	Yuxiang Lu,Shengcao Cao,Yu-Xiong Wang
発行日	2024-10-18 17:32:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー