Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

要約

赤外可視画像融合 (IVIF) は、2 つの画像モダリティの高度に相補的な特性により多くの注目を集めています。
グラウンドトゥルース融合画像が不足しているため、現在の深層学習ベースの手法の融合出力は数学的に定義された損失関数に大きく依存しています。
グラウンドトゥルースがなければ融合画像を数学的に適切に定義するのは難しいため、既存の融合手法のパフォーマンスには限界があります。
この論文では、まず自然言語を使用してIVIFの目的を表現することを提案します。これにより、電流損失における核融合出力の明示的な数学的モデリングを回避でき、言語表現の利点を最大限に活用して融合性能を向上させることができます。
この目的のために、私たちは包括的な言語で表現された融合目標を提示し、CLIP を使用して関連するテキストをマルチモーダル埋め込み空間にエンコードします。
次に、融合目的と入力画像モダリティを表す埋め込みベクトル間の関係を確立することにより、言語駆動融合モデルが埋め込み空間で構築されます。
最後に、教師ありトレーニングを介して、実際の IVIF を埋め込み言語駆動融合モデルと整合させるために、言語駆動損失が導出されます。
実験では、私たちの方法が既存の技術よりもはるかに優れた融合結果を得ることができることを示しています。

要約(オリジナル)

Infrared-visible image fusion (IVIF) has attracted much attention owing to the highly-complementary properties of the two image modalities. Due to the lack of ground-truth fused images, the fusion output of current deep-learning based methods heavily depends on the loss functions defined mathematically. As it is hard to well mathematically define the fused image without ground truth, the performance of existing fusion methods is limited. In this paper, we first propose to use natural language to express the objective of IVIF, which can avoid the explicit mathematical modeling of fusion output in current losses, and make full use of the advantage of language expression to improve the fusion performance. For this purpose, we present a comprehensive language-expressed fusion objective, and encode relevant texts into the multi-modal embedding space using CLIP. A language-driven fusion model is then constructed in the embedding space, by establishing the relationship among the embedded vectors to represent the fusion objective and input image modalities. Finally, a language-driven loss is derived to make the actual IVIF aligned with the embedded language-driven fusion model via supervised training. Experiments show that our method can obtain much better fusion results than existing techniques.

arxiv情報

著者	Yuhao Wang,Lingjuan Miao,Zhiqiang Zhou,Lei Zhang,Yajun Qiao
発行日	2024-02-26 03:08:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー