iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation

要約

学習済みのエンコーダとタスク固有のデコーダの完全な微調整（FFT）に基づく転移学習は、ディープモデルが指数関数的に成長するにつれて複雑さを増している。学習可能な小さな層で構成されるアダプタを用いたパラメータ効率的微調整（PEFT）アプローチは、FFTの代替として登場し、高い学習効率を維持しながら同等の性能を達成している。しかし、入力インスタンスに対するアダプタの柔軟性の低さが、多様な下流タスクにおけるタスク固有の情報を学習する能力を制限している。本論文では、入力インスタンスに条件付けされた動的なアダプタを活用する新しいPEFTアプローチ、入力条件付きトランスフォーマー（iConFormer）を提案する。様々な下流タスクにおいて入力インスタンスに対する柔軟な学習能力を確保するために、インスタンスレベルの特徴変換を可能にする動的アダプタに入力条件付きネットワーク（iCoN）を導入する。具体的には、iCoNは各特徴に対してチャネル単位の畳み込みカーネルを生成し、適応的な畳み込み処理を用いて変換することで、下流タスクに合わせたタスク固有のきめ細かな詳細を効果的に捉える。実験結果は、変換器のバックボーンパラメータをわずか1.6%から2.8%チューニングするだけで、iConFormerは単眼深度推定とセマンティックセグメンテーションにおいてFFTに匹敵する性能を達成し、画像分類とインスタンスセグメンテーションではFFTを上回ることを示す。また、提案手法は、上記のすべてのタスクにおいて、一貫して最近のPEFT手法を凌駕する。

要約(オリジナル)

Transfer learning based on full fine-tuning (FFT) of the pre-trained encoder and task-specific decoder becomes increasingly complex as deep models grow exponentially. Parameter efficient fine-tuning (PEFT) approaches using adapters consisting of small learnable layers have emerged as an alternative to FFT, achieving comparable performance while maintaining high training efficiency. However, the inflexibility of the adapter with respect to input instances limits its capability of learning task-specific information in diverse downstream tasks. In this paper, we propose a novel PEFT approach, input-Conditioned transFormer, termed iConFormer, that leverages a dynamic adapter conditioned on the input instances. To secure flexible learning ability on input instances in various downstream tasks, we introduce an input-Conditioned Network (iCoN) in the dynamic adapter that enables instance-level feature transformation. To be specific, iCoN generates channel-wise convolutional kernels for each feature and transform it using adaptive convolution process to effectively capture task-specific and fine-grained details tailor to downstream tasks. Experimental results demonstrate that by tuning just 1.6% to 2.8% of the Transformer backbone parameters, iConFormer achieves performance comparable to FFT in monocular depth estimation and semantic segmentation, while outperforming it in image classification and instance segmentation. Also, the proposed method consistently outperforms recent PEFT methods for all the tasks mentioned above.

arxiv情報

著者	Hayeon Jo,Hyesong Choi,Minhee Cho,Dongbo Min
発行日	2025-04-04 14:33:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー