Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification


大規模言語モデル (LLM) 世代の自動事実検証は、幻覚と戦うためにますます広く使用されるようになってきています。
私たちは、完全に原子的な事実は適切な表現ではないと主張し、分子的事実の 2 つの基準を定義します。それは、脱文脈性、またはそれらがどれだけうまく独立できるか、もう 1 つは最小性、または脱文脈性を達成するために追加される余分な情報がどれほど少ないかです。


Automatic factuality verification of large language model (LLM) generations is becoming more and more widely used to combat hallucinations. A major point of tension in the literature is the granularity of this fact-checking: larger chunks of text are hard to fact-check, but more atomic facts like propositions may lack context to interpret correctly. In this work, we assess the role of context in these atomic facts. We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality, or how little extra information is added to achieve decontexuality. We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information. We compare against various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.


著者 Anisha Gunjal,Greg Durrett
発行日 2024-06-28 17:43:48+00:00
カテゴリー: cs.AI, cs.CL パーマリンク