Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

要約

圧縮映像超解像（VSR）は，圧縮された低解像度のフレームから高解像度のフレームを復元することを目的としている．最近のVSR手法の多くは、隣接するビデオフレームから関連するテクスチャを借用して、入力フレームを拡張することが多い。しかし、圧縮された映像から高品質なテクスチャを効果的に抽出し、転送することは非常に困難である。本論文では、時空間・周波数領域で自己調整を行う新しい周波数変換型圧縮映像超解像技術(FTVSR: Frequency-Transformer for Compressed Video Super-resolution)を提案する。まず、ビデオフレームをパッチに分割し、各パッチをDCTスペクトルマップ（各チャンネルが周波数帯域を表す）に変換する。このような設計により、各周波数帯域できめ細かいレベルの自己注意を可能にし、真の視覚的テクスチャをアーチファクトから区別し、さらにビデオフレームの復元に利用することができる。次に、様々な自己注意の方式を検討し、各周波数帯域に時間的注意を適用する前に、空間と周波数の合同注意を行う分割注意が、最高の映像強調品質につながることを発見した。広く用いられている2つのビデオ超解像ベンチマークに対する実験結果から、FTVSRは非圧縮および圧縮ビデオの両方において、明確な視覚的マージンをもって、最先端のアプローチを上回る性能を発揮することが示された。コードは https://github.com/researchmm/FTVSR で公開されています。

要約(オリジナル)

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by borrowing relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. In this paper, we propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band. Such a design enables a fine-grained level self-attention on each frequency band, so that real visual texture can be distinguished from artifacts, and further utilized for video frame restoration. Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality. Experimental results on two widely-used video super-resolution benchmarks show that FTVSR outperforms state-of-the-art approaches on both uncompressed and compressed videos with clear visual margins. Code is available at https://github.com/researchmm/FTVSR.

arxiv情報

著者	Zhongwei Qiu,Huan Yang,Jianlong Fu,Dongmei Fu
発行日	2022-08-05 07:02:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー