An Effective Framework to Help Large Language Models Handle Numeric-involved Long-context Tasks

要約

大規模言語モデル (LLM) は、長いテキストの処理において優れた機能を実証しており、従来の検索タスクではほぼ完璧なパフォーマンスを発揮します。
ただし、長いコンテキストでの数値計算になると、パフォーマンスが大幅に低下します。
数値が関与するロングコンテキストのタスクは、複雑で大量の情報を同時に処理する際の固有の制限により、通常の設定では現在の LLM では対処できません。
CoT のようなプロンプト手法の中には、精度を向上させることができるものもありますが、大量の出力トークンが必要となり、コストと時間がかかります。
この問題に対処するために、数値が関与する長いコンテキストのタスクを、コードと結論による判断、抽出、処理という 4 つの低レベルのサブタスクに分解するワークフローを提案します。
前の 2 つのサブタスクは比較的単純なので、より小さなモデルを使用して長いコンテキストを効率的に処理できます。
数値計算が必要な場合は、LLM が計算に弱いという欠点を避けるために、LLM によって生成されたコードを使用します。
2 つの数値関連のロングコンテキストベンチマークの結果は、ワークフローが精度を向上させるだけでなく、API 呼び出しのコストを大幅に削減できることを示しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable capabilities in handling long texts and have almost perfect performance in traditional retrieval tasks. However, their performance significantly degrades when it comes to numerical calculations in the long-context. Numeric-involved long-context tasks typically cannot be addressed by current LLMs in normal settings due to their inherent limitations in simultaneously handling complex and massive information. Some CoT like prompting methods can improve accuracy but demands massive output tokens, which is costly and slow. To address this issue, we propose a workflow, which decompose a numeric-involved long-context task into 4 low-level subtasks: judging, extracting and processing with code and conclusion. The former 2 subtasks is relatively simple, which allows us to use smaller models for efficiently processing long context. When numerical calculations are required, we use code generated by LLMs to avoid the disadvantage of LLM not being good at calculations. The results in 2 numeric-involved long-context benchmarks demonstrate our workflow can not only improve accuracy, but also significantly reduce the cost of API calls.

arxiv情報

著者	Yijiong Yu
発行日	2024-11-15 12:39:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

An Effective Framework to Help Large Language Models Handle Numeric-involved Long-context Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー