Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

要約

ビデオのぼけ除去は、ビデオシーケンス内の他のフレームからの情報を利用して、現在のフレーム内のぼやけた領域を復元します。
主流のアプローチは、ビデオシーケンスから情報を抽出するために、双方向の特徴伝播、時空間変換、またはその両方の組み合わせを採用しています。
ただし、メモリと計算リソースの制限により、時空間変換器の時間ウィンドウの長さが制限され、ビデオシーケンスからより長い時間コンテキスト情報を抽出できなくなります。
さらに、双方向の特徴伝播は、ぼやけたフレーム内の不正確なオプティカルフローの影響を非常に受けやすいため、伝播プロセス中にエラーが蓄積されます。
これらの問題に対処するために、\textbf{BSSTNet}、\textbf{B}lur-aware \textbf{S}patio-temporal \textbf{S}parse \textbf{T}ransformer Network を提案します。
ブラーマップが導入され、元の密集した注意が疎な形式に変換され、ビデオシーケンス全体にわたって情報をより広範に利用できるようになります。
具体的には、BSSTNet (1) は、トランスフォーマーでより長い時間ウィンドウを使用し、より離れたフレームからの情報を活用して、現在のフレームのぼやけたピクセルを復元します。
(2) では、ブラーマップによって誘導される双方向の特徴伝播が導入され、ブラーフレームによって引き起こされるエラーの蓄積が軽減されます。
実験結果は、提案された BSSTNet が GoPro および DVD データセットに対する最先端の方法よりも優れていることを示しています。

要約(オリジナル)

Video deblurring relies on leveraging information from other frames in the video sequence to restore the blurred regions in the current frame. Mainstream approaches employ bidirectional feature propagation, spatio-temporal transformers, or a combination of both to extract information from the video sequence. However, limitations in memory and computational resources constraints the temporal window length of the spatio-temporal transformer, preventing the extraction of longer temporal contextual information from the video sequence. Additionally, bidirectional feature propagation is highly sensitive to inaccurate optical flow in blurry frames, leading to error accumulation during the propagation process. To address these issues, we propose \textbf{BSSTNet}, \textbf{B}lur-aware \textbf{S}patio-temporal \textbf{S}parse \textbf{T}ransformer Network. It introduces the blur map, which converts the originally dense attention into a sparse form, enabling a more extensive utilization of information throughout the entire video sequence. Specifically, BSSTNet (1) uses a longer temporal window in the transformer, leveraging information from more distant frames to restore the blurry pixels in the current frame. (2) introduces bidirectional feature propagation guided by blur maps, which reduces error accumulation caused by the blur frame. The experimental results demonstrate the proposed BSSTNet outperforms the state-of-the-art methods on the GoPro and DVD datasets.

arxiv情報

著者	Huicong Zhang,Haozhe Xie,Hongxun Yao
発行日	2024-06-11 17:59:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー