Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

要約

大規模な言語モデルは、ユニバーサル言語インターフェイスと大規模な事前トレーニングの恩恵を受けて、データ効率の高いジェネラリストを進化させてきました。
ただし、緻密なビジュアル予測のためのデータ効率の高いジェネラリストの構築には、タスクごとにラベル構造が異なるため、明確な課題が生じます。
その結果、低データ領域における目に見えない高密度予測タスクへの一般化は簡単ではなく、以前のビジョンジェネラリストからはあまり注目されていませんでした。
この研究では、目に見えない高密度のラベル構造に柔軟に適応できる普遍的なモデルをいくつかの例とともに探索し、現実世界の多様なシナリオでデータ効率の高いビジョンジェネラリストとして機能できるようにします。
この目的を達成するために、私たちは強力なメタ学習フレームワークに基づいてメソッドを構築し、柔軟な適応メカニズムやスケーラビリティなど、現実世界の問題に対するそのパフォーマンスと汎用性を向上させるためにいくつかの軸を検討します。
私たちは、ビデオ、3D、医療、生物学、ユーザー対話型タスクなど、ローショット学習が望ましいさまざまな目に見えない現実世界のシナリオにわたってモデルを評価します。
汎用アーキテクチャと効果的な適応メカニズムを備えた当社のモデルは、最大 50 枚のラベル付き画像でこれらすべてのタスクに柔軟に適応し、既存のデータ効率の高いジェネラリストアプローチに比べて大幅な進歩を示しています。
コードは https://github.com/GitGyun/chameleon で入手できます。

要約(オリジナル)

Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime is not straightforward and has received less attention from previous vision generalists. In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios. To this end, we base our method on a powerful meta-learning framework and explore several axes to improve its performance and versatility for real-world problems, such as flexible adaptation mechanisms and scalability. We evaluate our model across a spectrum of unseen real-world scenarios where low-shot learning is desirable, including video, 3D, medical, biological, and user-interactive tasks. Equipped with a generic architecture and an effective adaptation mechanism, our model flexibly adapts to all of these tasks with at most 50 labeled images, showcasing a significant advancement over existing data-efficient generalist approaches. Codes are available at https://github.com/GitGyun/chameleon.

arxiv情報

著者	Donggyun Kim,Seongwoong Cho,Semin Kim,Chong Luo,Seunghoon Hong
発行日	2024-11-18 13:03:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー