Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

要約

大幅な進歩にもかかわらず、オールインワン画像復元 (IR) は、現実世界の複雑な劣化を処理する際の永続的な課題に取り組んでいます。
この文書では、安定拡散 (SD) 事前分布を利用してオールインワン画像復元の適応性、一般化性、忠実性を強化する新しいマルチモーダルプロンプト学習アプローチである MPerceiver を紹介します。
具体的には、全体的な表現のためのテキストと、マルチスケールの詳細表現のためのビジュアルという 2 つのタイプの SD プロンプトを習得するためのデュアルブランチモジュールを開発します。
どちらのプロンプトも、CLIP 画像エンコーダーからの劣化予測によって動的に調整され、さまざまな未知の劣化に対する適応的な対応が可能になります。
さらに、プラグインの詳細改良モジュールにより、エンコーダーからデコーダーへの情報の直接変換により復元の忠実度が向上します。
私たちの手法を評価するために、MPerceiver はオールインワン IR の 9 つのタスクでトレーニングされており、ほとんどのタスクにわたって最先端のタスク固有の手法を上回っています。
マルチタスクの事前トレーニング後、MPerceiver は低レベルの視覚で一般化された表現を達成し、目に見えないタスクで顕著なゼロショットおよび少数ショットの能力を示します。
16 の IR タスクに関する広範な実験により、適応性、汎用性、忠実性の点で MPerceiver の優位性が強調されています。

要約(オリジナル)

Despite substantial progress, all-in-one image restoration (IR) grapples with persistent challenges in handling intricate real-world degradations. This paper introduces MPerceiver: a novel multimodal prompt learning approach that harnesses Stable Diffusion (SD) priors to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. Specifically, we develop a dual-branch module to master two types of SD prompts: textual for holistic representation and visual for multiscale detail representation. Both prompts are dynamically adjusted by degradation predictions from the CLIP image encoder, enabling adaptive responses to diverse unknown degradations. Moreover, a plug-in detail refinement module improves restoration fidelity via direct encoder-to-decoder information transformation. To assess our method, MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks. Post multitask pre-training, MPerceiver attains a generalized representation in low-level vision, exhibiting remarkable zero-shot and few-shot capabilities in unseen tasks. Extensive experiments on 16 IR tasks underscore the superiority of MPerceiver in terms of adaptiveness, generalizability and fidelity.

arxiv情報

著者	Yuang Ai,Huaibo Huang,Xiaoqiang Zhou,Jiexiang Wang,Ran He
発行日	2024-03-20 16:12:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー