Foundation Models on a Budget: Approximating Blocks in Large Vision Models

要約

ファンデーションモデルは、さまざまなタスクやドメインで印象的なパフォーマンスを示していますが、大規模な計算リソースが必要であり、アクセシビリティと持続可能性に関する懸念を高めています。
ファンデーションモデルのサイズを縮小する以前の試みは、追加のトレーニングステップを通じて計算負荷が増加することになっているため、問題に完全に対処することはできません。
最近の作品は、深いニューラルネットワークが内部表現の類似性を示すことを明らかにしています。
ネットワーク間の類似点により、モデルのステッチやマージなどの手法が有効になっていますが、ネットワーク内の類似点は効率を改善するために既存の依存のままです。
このホワイトペーパーでは、変圧器ブロック近似（TBA）を提案します。これは、ネットワーク内の類似性を活用して、大型視覚モデルの変圧器ブロックを識別し、近似する新しい方法です。
TBAは、モデルの残りの部分を再訓練または微調整することなく、軽量の閉形型変換を使用してこれらのブロックを置き換えます。
提案された方法は、下流タスクへの影響を最小限に抑えながら、パラメーターの数を減らします。
TBAの有効性と一般化可能性を、複数のデータセット（例：Imagenet-1KおよびCIFAR100）と最先端の優先視覚モデル（例：vit、dino-v2、deit）にわたる広範な実験を検証します。

要約(オリジナル)

Foundation Models have shown impressive performance in various tasks and domains, yet they require massive computational resources, raising concerns about accessibility and sustainability. Previous attempts to reduce foundation model size fall short of fully addressing the problem, as they end up increasing computational load through additional training steps. Recent works reveal that deep neural networks exhibit internal representation similarities. While inter-network similarities have enabled techniques such as model stitching and merging, intra-network similarities remain underexplored for improving efficiency. In this paper, we propose Transformer Blocks Approximation (TBA), a novel method that leverages intra-network similarities to identify and approximate transformer blocks in large vision models. TBA replaces these blocks using lightweight, closed-form transformations, without retraining or fine-tuning the rest of the model. The proposed method reduces the number of parameters while having minimal impact on the downstream task. We validate the effectiveness and generalizability of TBA through extensive experiments across multiple datasets (e.g., Imagenet-1k and CIFAR100) and state-of-the-art pretrained vision models (e.g, ViT, DiNO-v2, and DEiT).

arxiv情報

著者	Irene Cannistraci,Simone Antonelli,Emanuele Palumbo,Thomas M. Sutter,Emanuele Rodolà,Bastian Rieck,Julia E. Vogt
発行日	2025-05-27 16:22:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Foundation Models on a Budget: Approximating Blocks in Large Vision Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー