HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

要約

学習ベースのビデオ圧縮は現在人気の研究テーマであり、従来の標準ビデオコーデックと競合する可能性があります。
これに関連して、Implicit Neural Representations (INR) は、画像やビデオのコンテンツを表現および圧縮するために以前から使用されており、他の方法と比較して比較的高いデコード速度を示しています。
しかし、既存の INR ベースの方法では、最先端のビデオ圧縮に匹敵するレート品質のパフォーマンスを実現できませんでした。
これは主に、使用されているネットワークアーキテクチャが単純であるため、表現能力が制限されていることが原因です。
この論文では、軽量レイヤと新しい階層位置エンコーディングを組み合わせた INR である HiNeRV を提案します。
当社は、深さ方向の畳み込み層、MLP 層、および補間層を採用して、大容量の深くて広いネットワークアーキテクチャを構築します。
HiNeRV は、ビデオをフレームとパッチの両方で同時にエンコードする統合表現でもあり、既存の方法よりも高いパフォーマンスと柔軟性を提供します。
さらに、HiNeRV に基づくビデオコーデックと、非可逆モデル圧縮中に HiNeRV のパフォーマンスをより適切に維持できるトレーニング、プルーニング、量子化用の洗練されたパイプラインを構築します。
提案された方法は、ビデオ圧縮の UVG データセットと MCL-JCV データセットの両方で評価され、学習ベースのコーデックと比較した場合、既存のすべての INR ベースラインと競合パフォーマンスを大幅に改善することが実証されました (HNeRV と比較して全体のビットレートが 72.3%、DCVC と比較して 43.4% 節約)
UVG データセット上、PSNR で測定)。

要約(オリジナル)

Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV’s performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR).

arxiv情報

著者	Ho Man Kwan,Ge Gao,Fan Zhang,Andrew Gower,David Bull
発行日	2024-01-26 15:54:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー