LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

要約

密度の高いダウンストリームタスクの ViT 機能のパフォーマンスを向上させるための、簡単な自己教師あり手法を紹介します。
当社の Lightweight Feature Transform (LiFT) は、事前トレーニングされた ViT バックボーンの機能を強化するために適用できる、簡単でコンパクトな後処理ネットワークです。
LiFT は、自己教師あり目標を使用してトレーニングするのが高速かつ簡単で、余分な推論コストを最小限に抑えて ViT 機能の密度を高めます。
さらに、COCO の検出とセグメンテーションのために LiFT を ViTDet と統合することで、追加のタスク固有のダウンストリームモジュールを使用するアプローチで LiFT を適用できることを実証します。
LiFT はシンプルであるにもかかわらず、それが単に双線形補間のより複雑なバージョンを学習しているだけではないことがわかります。
代わりに、私たちの LiFT トレーニングプロトコルは、高密度の下流タスクで ViT 機能に利益をもたらす、いくつかの望ましい創発特性をもたらします。
これには、フィーチャのスケール不変性の向上とオブジェクト境界マップの改善が含まれます。
LiFT をいくつかのエポックでトレーニングするだけで、キーポイント対応、検出、セグメンテーション、およびオブジェクト検出タスクのパフォーマンスが向上することがわかります。
全体として、LiFT は、数分の 1 の計算コストで高密度の特徴配列の利点を引き出す簡単な方法を提供します。
詳細については、プロジェクトページ (https://www.cs.umd.edu/~sakshams/LiFT/) を参照してください。

要約(オリジナル)

We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for minimal extra inference cost. Furthermore, we demonstrate that LiFT can be applied with approaches that use additional task-specific downstream modules, as we integrate LiFT with ViTDet for COCO detection and segmentation. Despite the simplicity of LiFT, we find that it is not simply learning a more complex version of bilinear interpolation. Instead, our LiFT training protocol leads to several desirable emergent properties that benefit ViT features in dense downstream tasks. This includes greater scale invariance for features, and better object boundary maps. By simply training LiFT for a few epochs, we show improved performance on keypoint correspondence, detection, segmentation, and object discovery tasks. Overall, LiFT provides an easy way to unlock the benefits of denser feature arrays for a fraction of the computational cost. For more details, refer to our project page at https://www.cs.umd.edu/~sakshams/LiFT/.

arxiv情報

著者	Saksham Suri,Matthew Walmer,Kamal Gupta,Abhinav Shrivastava
発行日	2024-03-21 17:59:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー