Lossless data compression by large models

要約

最新のデータ圧縮方法は、80年にわたる研究、数百万の論文、幅広いアプリケーションの後、徐々に制限に達しています。
しかし、贅沢な6G通信速度要件は、データ圧縮の革新的な新しいアイデアに大きな未解決の疑問を提起します。
私たちは、合理的な仮定の下で、すべての理解または学習が圧縮であることを以前に示しました。
大規模な言語モデル（LLM）は、これまで以上にデータをよく理解しています。
彼らは私たちがデータを圧縮するのを助けることができますか？
LLMSは、計算されないソロモノフ誘導に近似するように見える場合があります。
したがって、この新しい計算不可能なパラダイムの下で、LMCompressを提示します。
LMCompressは、以前のすべてのロスレス圧縮アルゴリズムを粉砕し、画像のJPEG-XLのロスレス圧縮比を2倍、ビデオの場合はH.264、テキストのBZ2の圧縮比を4倍にします。
大規模なモデルがデータをよりよく理解すればするほど、LMCompressは圧縮されます。

要約(オリジナル)

Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. Large language models (LLMs) understand data better than ever before. Can they help us to compress data? The LLMs may be seen to approximate the uncomputable Solomonoff induction. Therefore, under this new uncomputable paradigm, we present LMCompress. LMCompress shatters all previous lossless compression algorithms, doubling the lossless compression ratios of JPEG-XL for images, FLAC for audios, and H.264 for videos, and quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.

arxiv情報

著者	Ziguang Li,Chao Huang,Xuliang Wang,Haibo Hu,Cole Wyeth,Dongbo Bu,Quan Yu,Wen Gao,Xingwu Liu,Ming Li
発行日	2025-04-30 15:11:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lossless data compression by large models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー