UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

要約

実世界のデータには膨大な量のマルチモーダル情報が含まれており、その中で最も代表的な 2 つのモダリティは視覚と言語です。
さらに、トランスフォーマーなどのますます重いモデルは、研究者の注目をモデル圧縮に引き付けています。
ただし、マルチモーダルモデル、特にビジョン言語のトランスフォーマーを圧縮する方法は、まだ十分に検討されていません。
この論文では、\textbf{U}nified および \textbf{P}r\textbf{o}gressive \textbf{P}runing (UPop) をユニバーサルビジョン言語 Transformer 圧縮フレームワークとして提案します。
元のモデルからの継続的な最適化空間で、圧縮可能なモダリティと構造の間で剪定比の自動割り当てを可能にします。
2) サブネットの漸進的な検索と再トレーニング。これにより、検索と再トレーニングの間の収束が維持され、より高い圧縮率が達成されます。
Visual Reasoning、Image Caption、Visual Question Answer、Image-Text Retrieval、Text-Image Retrieval、および Image Classification を含む複数の生成的および識別的な視覚言語タスクに関する実験は、提案された UPop フレームワークの有効性と汎用性を示しています。

要約(オリジナル)

Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, e.g., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (UPop) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on multiple generative and discriminative vision-language tasks, including Visual Reasoning, Image Caption, Visual Question Answer, Image-Text Retrieval, Text-Image Retrieval, and Image Classification, demonstrate the effectiveness and versatility of the proposed UPop framework.

arxiv情報

著者	Dachuan Shi,Chaofan Tao,Ying Jin,Zhendong Yang,Chun Yuan,Jiaqi Wang
発行日	2023-01-31 16:18:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー