Dodo: Dynamic Contextual Compression for Decoder-only LMs

要約

トランスフォーマーベースの言語モデル (LM) は、長いコンテキストでは非効率的です。
私たちは、コンテキスト圧縮のソリューションである Dodo を提案します。
標準のトランスフォーマーモデルのトークンごとに 1 つのベクトルの代わりに、Dodo は各レイヤーで動的な数の隠れ状態でテキストを表現し、自己注意のコストを通常の時間と空間の一部に削減します。
さらに、LLaMA などの既製のモデルは、LoRA などの効率的なパラメータ調整方法によって Dodo に適合させることができます。
使用中、Dodo は、ダウンストリームタスクの自己回帰 LM またはコンテキストコンプレッサーとして機能します。
私たちは、言語モデリング、質問応答、要約の実験を通じて、Dodo がこれらのタスクの機能を維持しながら、デコード中のオーバーヘッドを大幅に削減することを実証しました。
たとえば、自動エンコーディングタスクでは、Dodo は再構築のために 98% の BLEU スコアを使用して 20 倍の圧縮率でコンテキストを圧縮し、ほぼロスレスのエンコーディングを実現します。

要約(オリジナル)

Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, Dodo can act as either an autoregressive LM or a context compressor for downstream tasks. We demonstrate through experiments in language modeling, question answering, and summarization that Dodo retains capabilities in these tasks, while drastically reducing the overhead during decoding. For example, in the autoencoding task, Dodo shrinks context at a 20x compression ratio with a BLEU score of 98% for reconstruction, achieving nearly lossless encoding.

arxiv情報

著者	Guanghui Qin,Corby Rosset,Ethan C. Chau,Nikhil Rao,Benjamin Van Durme
発行日	2024-06-13 15:19:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dodo: Dynamic Contextual Compression for Decoder-only LMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー