DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

要約

最近の研究では、畳み込みニューラルネットワーク (CNN) を使用したデュアルピクセルデータに基づく焦点ぼけ除去タスクで優れた結果が得られていますが、データが不足しているため、このタスクでのビジョントランスフォーマーの探索と試みが制限されています。
さらに、既存の作品では、固定パラメーターとネットワークアーキテクチャを使用して、分布とコンテンツ情報が異なる画像のブレを除去します。これは、モデルの一般化能力にも影響します。
この論文では、デュアルピクセル画像の焦点ぼけ除去のために、DMTNetという名前の動的マルチスケールネットワークを提案します。
DMTNet には主に、特徴抽出モジュールと再構成モジュールの 2 つのモジュールが含まれています。
特徴抽出モジュールは、いくつかのビジョントランスフォーマーブロックで構成されており、その強力な特徴抽出機能を使用して、より豊富な特徴を取得し、モデルのロバスト性を向上させます。
再構成モジュールは、複数の動的マルチスケールサブ再構成モジュール (DMSSRM) で構成されます。
DMSSRM は、入力画像のぼかし分布とコンテンツ情報に従って、さまざまなスケールの特徴に適応的に重みを割り当てることにより、画像を復元できます。
DMTNet はトランスフォーマーと CNN の利点を組み合わせたもので、ビジョントランスフォーマーは CNN のパフォーマンスシーリングを改善し、CNN の誘導バイアスにより、トランスフォーマーは大量のデータに依存することなく、より堅牢な特徴を抽出できます。
DMTNet は、ビジョントランスフォーマーを使用してぼやけた画像を鮮明に復元する最初の試みかもしれません。
CNN と組み合わせることで、ビジョントランスフォーマーは小さなデータセットでより優れたパフォーマンスを達成できる可能性があります。
一般的なベンチマークでの実験結果は、DMTNet が最先端の方法よりも大幅に優れていることを示しています。

要約(オリジナル)

Recent works achieve excellent results in defocus deblurring task based on dual-pixel data using convolutional neural network (CNN), while the scarcity of data limits the exploration and attempt of vision transformer in this task. In addition, the existing works use fixed parameters and network architecture to deblur images with different distribution and content information, which also affects the generalization ability of the model. In this paper, we propose a dynamic multi-scale network, named DMTNet, for dual-pixel images defocus deblurring. DMTNet mainly contains two modules: feature extraction module and reconstruction module. The feature extraction module is composed of several vision transformer blocks, which uses its powerful feature extraction capability to obtain richer features and improve the robustness of the model. The reconstruction module is composed of several Dynamic Multi-scale Sub-reconstruction Module (DMSSRM). DMSSRM can restore images by adaptively assigning weights to features from different scales according to the blur distribution and content information of the input images. DMTNet combines the advantages of transformer and CNN, in which the vision transformer improves the performance ceiling of CNN, and the inductive bias of CNN enables transformer to extract more robust features without relying on a large amount of data. DMTNet might be the first attempt to use vision transformer to restore the blurring images to clarity. By combining with CNN, the vision transformer may achieve better performance on small datasets. Experimental results on the popular benchmarks demonstrate that our DMTNet significantly outperforms state-of-the-art methods.

arxiv情報

著者	Dafeng Zhang,Xiaobing Wang
発行日	2022-09-13 14:47:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー