E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

要約

大規模言語モデル (LLM) の領域では、複数ラウンドの対話、コード生成、文書の要約などのタスクにおいて、長いコンテキストを処理する機能がますます重要になっています。
このペーパーでは、ロングコンテキストのパフォーマンスを強化し、計算の複雑さを軽減し、総称して「インポッシブル・トライアングル」と呼ばれる事前トレーニング済みモデルを活用するという課題に取り組みます。
このパラドックスを効果的に解決する新しいアプローチである E2LLM (Encoder Elongated Large Language Models) を紹介します。
この方法では、長いコンテキストをチャンクに分割し、事前トレーニングされたテキストエンコーダーを介してそれぞれを埋め込みベクトルに圧縮し、アダプターを利用してこれらの表現をデコーダー専用 LLM と位置合わせします。
エンコーダ出力の再構築とロングコンテキスト命令の微調整に焦点を当てた 2 つのトレーニング目標が、LLM によるソフトプロンプトの理解を容易にするために採用されています。
実験結果は、E2LLM が効率、パフォーマンス、および事前トレーニング済みモデルとの互換性のバランスをとりながら、長いコンテキストのシナリオで優れたパフォーマンスを達成することを示しています。
したがって、私たちのフレームワークはこの分野での大きな進歩を表し、効果的な長文モデリングに貢献します。

要約(オリジナル)

In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the ‘impossible triangle.’ We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.

arxiv情報

著者	Zihan Liao,Jun Wang,Hang Yu,Lingxiao Wei,Jianguo Li,Jun Wang,Wei Zhang
発行日	2024-09-10 17:44:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー