NERV++: An Enhanced Implicit Neural Video Representation

要約

暗黙的ニューラル表現 (INR) とも呼ばれるニューラルフィールドは、さまざまなデータ型を表現、生成、操作する優れた能力を示しており、少ないメモリフットプリントで継続的にデータを再構築できます。
有望ではあるものの、ビデオ圧縮に適用される INR はレート歪み性能を大幅に改善する必要があり、高周波の詳細をキャプチャするには膨大な数のパラメータと長いトレーニング反復が必要であり、その幅広い適用性が制限されています。
この問題を解決することは依然として非常に困難な作業であり、これにより圧縮タスクで INR がよりアクセスしやすくなります。
私たちは、元の NeRV デコーダアーキテクチャをより直接的かつ効果的に拡張したビデオのニューラル表現である NeRV++ を導入することで、これらの欠点を解決する一歩を踏み出しました。これは、アップサンプリングブロックを挟む分離可能な conv2d 残差ブロック (SCRB) を特徴としています。
UB)、および特徴表現を改善するための双線形補間スキップレイヤー。
NeRV++ を使用すると、ニューラルネットワークによって近似された関数としてビデオを直接表現できるようになり、現在の INR ベースのビデオコーデックを超えて表現能力が大幅に向上します。
UVG、MCL JVC、および Bunny データセットで手法を評価し、INR を使用したビデオ圧縮で競争力のある結果を達成しました。
この成果により、オートエンコーダベースのビデオコーディングとの差が縮まり、INR ベースのビデオ圧縮研究が大きく前進しました。

要約(オリジナル)

Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.

arxiv情報

著者	Ahmed Ghorbel,Wassim Hamidouche,Luce Morin
発行日	2024-02-28 13:00:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NERV++: An Enhanced Implicit Neural Video Representation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー