Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

要約

実世界のシナリオにおいて、領域適応と汎化を達成することは、モデルが未知のターゲット分布に適応または汎化しなければならないため、重要な課題を提起する。これらの能力を未知のマルチモーダル分布に拡張すること、すなわちマルチモーダル領域適応と汎化は、異なるモダリティの明確な特性のため、さらに困難である。行動認識からセマンティックセグメンテーションに至るまで、その応用範囲は多岐にわたる。また、CLIPのような大規模な事前学習済みマルチモーダル基礎モデルの出現により、これらのモデルを活用して適応と汎化の性能を向上させたり、下流のタスクに適応させたりする研究が触発されている。本サーベイでは、(1)マルチモーダルなドメイン適応、(2)マルチモーダルなテスト時間適応、(3)マルチモーダルなドメイン汎化、(4)マルチモーダルな基礎モデルの助けを借りたドメイン適応と汎化、(5)マルチモーダルな基礎モデルの適応を取り上げ、従来の基礎モデルへのアプローチから最近の進歩を初めて包括的にレビューする。それぞれのトピックについて、問題を正式に定義し、既存の手法を徹底的にレビューする。さらに、関連するデータセットとアプリケーションを分析し、未解決の課題と潜在的な将来の研究の方向性を明らかにする。我々は、https://github.com/donghao51/Awesome-Multimodal-Adaptation、最新の文献を含むアクティブなリポジトリを維持している。

要約(オリジナル)

In real-world scenarios, achieving domain adaptation and generalization poses significant challenges, as models must adapt to or generalize across unknown target distributions. Extending these capabilities to unseen multimodal distributions, i.e., multimodal domain adaptation and generalization, is even more challenging due to the distinct characteristics of different modalities. Significant progress has been made over the years, with applications ranging from action recognition to semantic segmentation. Besides, the recent advent of large-scale pre-trained multimodal foundation models, such as CLIP, has inspired works leveraging these models to enhance adaptation and generalization performances or adapting them to downstream tasks. This survey provides the first comprehensive review of recent advances from traditional approaches to foundation models, covering: (1) Multimodal domain adaptation; (2) Multimodal test-time adaptation; (3) Multimodal domain generalization; (4) Domain adaptation and generalization with the help of multimodal foundation models; and (5) Adaptation of multimodal foundation models. For each topic, we formally define the problem and thoroughly review existing methods. Additionally, we analyze relevant datasets and applications, highlighting open challenges and potential future research directions. We maintain an active repository that contains up-to-date literature at https://github.com/donghao51/Awesome-Multimodal-Adaptation.

arxiv情報

著者	Hao Dong,Moru Liu,Kaiyang Zhou,Eleni Chatzi,Juho Kannala,Cyrill Stachniss,Olga Fink
発行日	2025-02-03 16:01:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー