Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

要約

最近の研究では、イベントカメラによる高画質撮像の改善に焦点が当てられており、そのほとんどはRGB領域に集中しています。しかし、これらの進歩は、RAW領域におけるイベントカメラのセンサー設計の固有の欠陥によってもたらされるユニークな課題を無視することが多い。具体的には、このセンサー設計はピクセル値の部分的な損失をもたらし、デモザイキングのようなRAW領域の処理に新たな課題をもたらします。RAW 領域のほとんどの研究は、各ピクセルが値を含むという前提に基づいているため、これらの方法をイベントカメラのデモザイク処理にそのまま適応させるのは問題があるため、この課題はさらに深刻になります。これを解決するために、我々はRAW領域処理における欠損ピクセル値のデモザイク処理のためのSwin-Transformerベースのバックボーンとピクセルフォーカス損失関数を提示する。我々の主な動機は、RGB領域からRAW領域処理のための一般的で広く適用可能な基礎モデルを改良することであり、それによって画像処理プロセス全体におけるモデルの適用範囲を広げることである。我々の方法は、マルチスケール処理と空間から深さへの技術を活用し、効率を確保し、計算の複雑さを軽減する。また、学習損失におけるロングテール分布の発見に基づいて、ネットワークの収束を改善するために、ネットワークの微調整のためのピクセルフォーカス損失関数を提案した。我々の手法は、MIPIデモザイクチャレンジデータセットで検証を行い、その後の解析実験によりその有効性が確認されました。全てのコードと学習済みモデルはこちらで公開されています: https://github.com/yunfanLu/ev-demosaic

要約(オリジナル)

Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing, thereby broadening the model’s applicability within the entire imaging process. Our method harnesses multi-scale processing and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://github.com/yunfanLu/ev-demosaic

arxiv情報

著者	Yunfan Lu,Yijie Xu,Wenzong Ma,Weiyu Guo,Hui Xiong
発行日	2024-04-03 13:30:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー