Attribute or Abstain: Large Language Models as Long Document Assistants

要約

LLM は人間が長い文書を扱うのに役立ちますが、幻覚を引き起こすことが知られています。
アトリビューションにより LLM 応答の信頼性が向上します。LLM はその応答を裏付ける証拠を提供し、検証可能性が高まります。
アトリビューションに対する既存のアプローチは、RAG 設定でのみ評価されており、最初の取得が LLM のパフォーマンスを混乱させます。
これは、検索が必要ないが役立つ可能性がある長い文書の設定とは決定的に異なります。
したがって、長い文書固有の帰属の評価が欠落しています。
このギャップを埋めるために、アトリビューションを伴う 6 つの多様な長いドキュメントタスクのベンチマークである LAB を提示し、プロンプトと微調整の両方で、サイズの異なる 4 つの LLM でアトリビューションへのさまざまなアプローチを実験します。
引用、つまり 1 ステップでの応答の生成と証拠の抽出が、ほとんどの場合最高のパフォーマンスを発揮することがわかりました。
帰属のために「Lost in the Middle」現象が存在するかどうかを調査しますが、これは見つかりません。
また、モデルが複雑な主張に対する証拠を提供するのに苦労しているため、単純な応答のデータセットでは証拠の品質によって応答の品質を予測できることもわかりましたが、複雑な応答の場合は予測できません。
さらなる調査のためにコードとデータを公開します。

要約(オリジナル)

LLMs can help humans working with long documents, but are known to hallucinate. Attribution can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document setting, where retrieval is not needed, but could help. Thus, a long document specific evaluation of attribution is missing. To fill this gap, we present LAB, a benchmark of 6 diverse long document tasks with attribution, and experiment with different approaches to attribution on 4 LLMs of different sizes, both prompted and fine-tuned. We find that citation, i.e. response generation and evidence extraction in one step, mostly performs best. We investigate whether the “Lost in the Middle” phenomenon exists for attribution, but do not find this. We also find that evidence quality can predict response quality on datasets with simple responses, but not so for complex responses, as models struggle with providing evidence for complex claims. We release code and data for further investigation.

arxiv情報

著者	Jan Buchmann,Xiao Liu,Iryna Gurevych
発行日	2024-07-10 16:16:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Attribute or Abstain: Large Language Models as Long Document Assistants

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー