How Well Do Large Language Models Truly Ground?

要約

大規模言語モデル (LLM) の固有の知識に依存すると、幻覚、制御不能、さまざまな知識の統合の困難などの問題が発生する可能性があります。
これを軽減するために、LLM をプローブして、多くの場合入力として与えられる外部コンテキストに基づいて応答を生成することができます (知識拡張モデル)。
しかし、これまでの研究は「グラウンディング」という言葉を狭い視野でとらえていることが多く、回答に正解が含まれているか否かのみに焦点を当てていることが多く、回答全体の信頼性が担保されていないことが多い。
この制限に対処するために、グラウンディングの厳密な定義を導入します。つまり、モデルの応答が (1) 提供されたコンテキストから必要な知識を完全に利用しており、(2) コンテキスト内の知識を超えていない場合、モデルは真にグラウンディングされていると見なされます。
この新しい定義を評価するために新しいデータセットとグラウンディング指標を導入し、異なるサイズとトレーニング方法の 13 個の LLM にわたって実験を実行して、グラウンディングのパフォーマンスに影響を与える要因についての洞察を提供します。
私たちの調査結果は、接地機能を改善する方法のより良い理解に貢献し、より信頼性が高く制御可能な LLM アプリケーションに向けた改善領域を示唆しています。

要約(オリジナル)

Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge. To mitigate this, LLMs can be probed to generate responses by grounding on external context, often given as input (knowledge-augmented models). Yet, previous research is often confined to a narrow view of the term ‘grounding’, often only focusing on whether the response contains the correct answer or not, which does not ensure the reliability of the entire response. To address this limitation, we introduce a strict definition of grounding: a model is considered truly grounded when its responses (1) fully utilize necessary knowledge from the provided context, and (2) don’t exceed the knowledge within the contexts. We introduce a new dataset and a grounding metric to assess this new definition and perform experiments across 13 LLMs of different sizes and training methods to provide insights into the factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.

arxiv情報

著者	Hyunji Lee,Sejune Joo,Chaeeun Kim,Joel Jang,Doyoung Kim,Kyoung-Woon On,Minjoon Seo
発行日	2023-11-15 16:11:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How Well Do Large Language Models Truly Ground?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー