Robustly overfitting latents for flexible neural image compression

要約

ニューラル画像圧縮は大きく進歩しました。
最先端のモデルは変分オートエンコーダーに基づいており、従来のモデルよりも優れたパフォーマンスを発揮します。
ニューラル圧縮モデルは、画像を量子化された潜在表現にエンコードする方法を学習します。この表現はデコーダーに効率的に送信でき、デコーダーは量子化された潜在を再構築画像に復号します。
これらのモデルは実際には成功していることが証明されていますが、不完全な最適化とエンコーダーとデコーダーの能力の制限により、次善の結果が得られます。
最近の研究では、確率的ガンベルアニーリング (SGA) を使用して、事前トレーニングされたニューラル画像圧縮モデルの潜在力を改良する方法が示されています。
私たちは、SGA に基づいて構築された 3 つの異なるメソッドを含む SGA+ を導入することで、このアイデアを拡張します。
私たちの方法が、以前の方法と比較して、R-D トレードオフの観点から全体的な圧縮パフォーマンスをどのように向上させるかを示します。
さらに、最もパフォーマンスの高い手法による潜在の改良により、Tecnick データセットと CLIC データセットの両方で圧縮パフォーマンスがどのように向上するかを示します。
私たちのメソッドは、事前トレーニングされたハイパープリアとより柔軟なモデルに展開されます。
さらに、提案した方法を詳細に分析し、ハイパーパラメータの選択に対する感度が低いことを示します。
最後に、各メソッドを 2 クラスの丸めではなく 3 クラスの丸めに拡張する方法を示します。

要約(オリジナル)

Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models. We extend this idea by introducing SGA+, which contains three different methods that build upon SGA. We show how our method improves the overall compression performance in terms of the R-D trade-off, compared to its predecessors. Additionally, we show how refinement of the latents with our best-performing method improves the compression performance on both the Tecnick and CLIC dataset. Our method is deployed for a pre-trained hyperprior and for a more flexible model. Further, we give a detailed analysis of our proposed methods and show that they are less sensitive to hyperparameter choices. Finally, we show how each method can be extended to three- instead of two-class rounding.

arxiv情報

著者	Yura Perugachi-Diaz,Arwin Gansekoele,Sandjai Bhulai
発行日	2024-11-05 14:00:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robustly overfitting latents for flexible neural image compression

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー