Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression

要約

学習型画像圧縮 (LIC) 方式の最近の進歩により、従来の手作りのコーデックよりも優れたパフォーマンスが実証されました。
これらの学習ベースの手法では、多くの場合、畳み込みニューラルネットワーク (CNN) またはトランスフォーマーベースのアーキテクチャが使用されます。
ただし、これらの非線形アプローチでは画像の周波数特性が見落とされることが多く、圧縮効率が制限されます。
この問題に対処するために、特徴マップ内の周波数成分を考慮することで変換段階を強化する、新しい Transformer ベースの画像圧縮方法を提案します。
私たちの手法は、新しいハイブリッド空間チャネルアテンショントランスフォーマーブロック (HSCATB) を統合しています。このブロックでは、空間ベースのブランチがアテンション層で高周波と低周波を独立して処理し、チャネル認識セルフアテンション (CaSA) モジュールがチャネル全体の情報をキャプチャします。
圧縮パフォーマンスが大幅に向上します。
さらに、Transformer ブロック内に混合ローカル-グローバルフィードフォワードネットワーク (MLGFFN) を導入し、効果的な圧縮に不可欠な多様で豊富な情報の抽出を強化します。
これらの革新により、データをより無相関な潜在空間に投影する変換の能力が全体的に向上し、それによって全体的な圧縮効率が向上します。
実験結果は、私たちのフレームワークがレート歪み性能において最先端の LIC 手法を上回っていることを示しています。

要約(オリジナル)

Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression performance. Additionally, we introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information, which is crucial for effective compression. These innovations collectively improve the transformation’s ability to project data into a more decorrelated latent space, thereby boosting overall compression efficiency. Experimental results demonstrate that our framework surpasses state-of-the-art LIC methods in rate-distortion performance.

arxiv情報

著者	Hamidreza Soltani,Erfan Ghasemi
発行日	2024-08-07 15:35:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー