Enhancing Learned Image Compression via Cross Window-based Attention

要約

近年、学習された画像圧縮方法は、従来の画像圧縮方法と比較して優れたレート歪み性能を実証しています。
最近の手法では、畳み込みニューラルネットワーク (CNN)、変分オートエンコーダー (VAE)、可逆ニューラルネットワーク (INN)、およびトランスフォーマーが利用されています。
これらのモデルの大きな貢献にもかかわらず、これらのモデルの主な欠点は、ローカル冗長性を確保する際のパフォーマンスが低いことです。
したがって、ローカル冗長性とともにグローバル機能を活用するために、機能エンコードモジュールと統合された CNN ベースのソリューションを提案します。
特徴エンコードモジュールは、重要な特徴を CNN に供給する前にエンコードし、クロススケールウィンドウベースのアテンションを利用して、ローカルの冗長性をさらにキャプチャします。
クロススケールウィンドウベースの注意は、変圧器の注意メカニズムからインスピレーションを受けており、受容野を効果的に拡大します。
私たちのアーキテクチャの特徴エンコードモジュールとクロススケールウィンドウベースのアテンションモジュールはどちらも柔軟性があり、他のネットワークアーキテクチャに組み込むことができます。
私たちはKodakおよびCLICデータセットで私たちの方法を評価し、私たちのアプローチが効果的であり、最先端の方法と同等であることを実証します。

要約(オリジナル)

In recent years, learned image compression methods have demonstrated superior rate-distortion performance compared to traditional image compression methods. Recent methods utilize convolutional neural networks (CNN), variational autoencoders (VAE), invertible neural networks (INN), and transformers. Despite their significant contributions, a main drawback of these models is their poor performance in capturing local redundancy. Therefore, to leverage global features along with local redundancy, we propose a CNN-based solution integrated with a feature encoding module. The feature encoding module encodes important features before feeding them to the CNN and then utilizes cross-scale window-based attention, which further captures local redundancy. Cross-scale window-based attention is inspired by the attention mechanism in transformers and effectively enlarges the receptive field. Both the feature encoding module and the cross-scale window-based attention module in our architecture are flexible and can be incorporated into any other network architecture. We evaluate our method on the Kodak and CLIC datasets and demonstrate that our approach is effective and on par with state-of-the-art methods.

arxiv情報

著者	Priyanka Mudgal,Feng Liu
発行日	2024-10-29 16:25:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhancing Learned Image Compression via Cross Window-based Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー