LION: Implicit Vision Prompt Tuning

要約

最近、さまざまなビジョンタスクにわたって競争力のあるパフォーマンスが得られているにもかかわらず、ビジョントランスフォーマーには依然として膨大な計算コストという問題があります。
最近、ビジョンプロンプト学習により、大規模モデル全体を微調整することなく、この問題に対する経済的な解決策が提供されています。
ただし、既存のモデルの効率は、大規模なプロンプトブロックの挿入やトリックプロンプト設計により、依然として満足のいくものとは程遠いです。
この論文では、impLicit vIsion prOmpt tuNing (LION) という名前の効率的なビジョンモデルを提案します。これは、さまざまな複雑なタスクに対して安定したメモリコストを備えた深い暗黙的モデルによって動機付けられます。
特に、バックボーン内のパラメーターを凍結して、事前にトレーニングされたメインバックボーンの両端にある 2 つの平衡暗黙レイヤーを昆虫するだけです。
さらに、宝くじ仮説に従って、これら 2 つの層のパラメータを枝刈りします。
当社の LION によって得られるパフォーマンスは、幅広いデータセットで期待できます。
特に、当社の LION は、特に困難なシーンにおいて、最先端のベースライン VPT と比較してより高いパフォーマンスを獲得しながら、トレーニングパラメーター数を最大 11.5% 削減します。
さらに、私たちが提案した LION は優れた汎化パフォーマンスを備えており、将来的に転移学習を強化する簡単な方法であることがわかりました。

要約(オリジナル)

Despite recent competitive performance across a range of vision tasks, vision Transformers still have an issue of heavy computational costs. Recently, vision prompt learning has provided an economic solution to this problem without fine-tuning the whole large-scale models. However, the efficiency of existing models are still far from satisfactory due to insertion of extensive prompts blocks and trick prompt designs. In this paper, we propose an efficient vision model named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable memory costs for various complex tasks. In particular, we merely insect two equilibrium implicit layers in two ends of the pre-trained main backbone with parameters in the backbone frozen. Moreover, we prune the parameters in these two layers according to lottery hypothesis. The performance obtained by our LION are promising on a wide range of datasets. In particular, our LION reduces up to 11.5% of training parameter numbers while obtaining higher performance compared with the state-of-the-art baseline VPT, especially under challenging scenes. Furthermore, we find that our proposed LION had a good generalization performance, making it an easy way to boost transfer learning in the future.

arxiv情報

著者	Haixin Wang,Jianlong Chang,Xiao Luo,Jinan Sun,Zhouchen Lin,Qi Tian
発行日	2024-03-27 16:20:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LION: Implicit Vision Prompt Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー