Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

要約

パラメータ効率チューニング (PET) は、パフォーマンスを維持しながらパラメータの数を削減し、ハードウェアリソースの節約を高めることで注目を集めていますが、高密度の予測タスクやモダリティ間の相互作用を調査した研究はほとんどありません。
この論文では、参照画像セグメンテーションに関する効率的な調整問題の調査を行います。
私たちは、クロスモーダルな情報交換を容易にし、タスク固有の情報を事前トレーニングされたモデルに注入するために、Bridger と呼ばれる新しいアダプターを提案します。
また、画像セグメンテーション用の軽量デコーダも設計します。
当社のアプローチは、困難なベンチマークで評価した場合、わずか 1.61\% ～ 3.38\% のバックボーンパラメーター更新で同等またはそれ以上のパフォーマンスを達成します。
コードは \url{https://github.com/kkakkkka/ETRIS} で入手できます。

要約(オリジナル)

Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities. In this paper, we do an investigation of efficient tuning problems on referring image segmentation. We propose a novel adapter called Bridger to facilitate cross-modal information exchange and inject task-specific information into the pre-trained model. We also design a lightweight decoder for image segmentation. Our approach achieves comparable or superior performance with only 1.61\% to 3.38\% backbone parameter updates, evaluated on challenging benchmarks. The code is available at \url{https://github.com/kkakkkka/ETRIS}.

arxiv情報

著者	Zunnan Xu,Zhihong Chen,Yong Zhang,Yibing Song,Xiang Wan,Guanbin Li
発行日	2023-07-21 12:46:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー