On Disentangled Training for Nonlinear Transform in Learned Image Compression

要約

学習済み画像圧縮 (LIC) は、従来のコーデックと比較して優れたレート歪み (R-D) パフォーマンスを実証していますが、最先端のモデルをゼロからトレーニングするには 2 週間以上かかる可能性があるトレーニングの非効率性という課題があります。
既存の LIC 手法では、非線形変換を学習する際のエネルギーの圧縮によって引き起こされる収束の遅さを見落としています。
この論文では、このようなエネルギー圧縮が 2 つの要素、すなわち特徴の無相関化と不均一なエネルギー変調から構成されることを初めて明らかにします。
このような基礎に基づいて、非線形変換のトレーニングにおけるエネルギー圧縮を解くための線形補助変換 (AuxT) を提案します。
提案された AuxT は、非線形変換による分布フィッティングを詳細まで単純化できるように、粗い近似を取得して効率的なエネルギー圧縮を実現します。
次に、ウェーブレットベースのダウンサンプリングと直交線形投影を利用して特徴の非相関化を行い、不均一なエネルギー変調に対してサブバンドを意識したスケーリングを行う、AuxT 用のウェーブレットベースの線形ショートカット (WLS) を開発します。
AuxT は軽量でプラグアンドプレイであり、さまざまな LIC モデルに統合して、コンバージェンスの遅さの問題に対処します。
実験結果は、提案されたアプローチが LIC モデルのトレーニングを 2 倍高速化し、同時に平均 1\% の BD レート削減を達成できることを示しています。
私たちの知る限り、これは、同等またはそれ以上のレート歪み性能で LIC の収束を大幅に改善できる最初の成功した試みの 1 つです。
コードは \url{https://github.com/qingshi9974/AuxT} でリリースされます。

要約(オリジナル)

Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that such energy compaction consists of two components, i.e., feature decorrelation and uneven energy modulation. On such basis, we propose a linear auxiliary transform (AuxT) to disentangle energy compaction in training nonlinear transforms. The proposed AuxT obtains coarse approximation to achieve efficient energy compaction such that distribution fitting with the nonlinear transforms can be simplified to fine details. We then develop wavelet-based linear shortcuts (WLSs) for AuxT that leverages wavelet-based downsampling and orthogonal linear projection for feature decorrelation and subband-aware scaling for uneven energy modulation. AuxT is lightweight and plug-and-play to be integrated into diverse LIC models to address the slow convergence issue. Experimental results demonstrate that the proposed approach can accelerate training of LIC models by 2 times and simultaneously achieves an average 1\% BD-rate reduction. To our best knowledge, this is one of the first successful attempt that can significantly improve the convergence of LIC with comparable or superior rate-distortion performance. Code will be released at \url{https://github.com/qingshi9974/AuxT}

arxiv情報

著者	Han Li,Shaohui Li,Wenrui Dai,Maida Cao,Nuowen Kan,Chenglin Li,Junni Zou,Hongkai Xiong
発行日	2025-01-23 15:32:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Disentangled Training for Nonlinear Transform in Learned Image Compression

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー