Learning Customized Visual Models with Retrieval-Augmented Knowledge

要約

CLIP などの画像とテキストの対照学習モデルは、強力なタスク転送能力を示しています。
これらのビジュアルモデルの高い汎用性と使いやすさは、Web スケールのデータ収集プロセスによって達成され、幅広いコンセプトカバレッジが保証されます。その後、すべての知識をモデルの重みにフィードするための費用のかかる事前トレーニングが続きます。
または、REACT (REtrieval-Augmented CusTomization) を提案します。これは、ターゲットドメイン用にカスタマイズされたビジュアルモデルを構築するために関連する Web 知識を取得するためのフレームワークです。
最も関連性の高い画像とテキストのペア (CLIP 事前トレーニングデータの約 3%) を外部知識として Web スケールデータベースから取得し、元の重みをすべて凍結しながら、新しいモジュール化されたブロックのみをトレーニングすることによってモデルをカスタマイズすることを提案します。
REACT の有効性は、分類、検索、検出、およびセグメンテーションタスクに関する広範な実験 (ゼロ、少数、およびフルショット設定を含む) によって実証されています。
特に、ゼロショット分類タスクでは、CLIP と比較して、ImageNet で最大 5.4%、ELEVATER ベンチマーク (20 データセット) で最大 3.7% の改善を達成しています。

要約(オリジナル)

Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose REACT, REtrieval-Augmented CusTomization, a framework to acquire the relevant web knowledge to build customized visual models for target domains. We retrieve the most relevant image-text pairs (~3% of CLIP pre-training data) from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights. The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings. Particularly, on the zero-shot classification task, compared with CLIP, it achieves up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark (20 datasets).

arxiv情報

著者	Haotian Liu,Kilho Son,Jianwei Yang,Ce Liu,Jianfeng Gao,Yong Jae Lee,Chunyuan Li
発行日	2023-01-17 18:59:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Customized Visual Models with Retrieval-Augmented Knowledge

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー