On Synthesizing Data for Context Attribution in Question Answering

要約

質問回答（QA）は、「野生の」LLM使用のかなりの部分を説明しています。
ただし、LLMは、「幻覚」としても知られる誤ったまたは誤解を招く反応を生成することがあります。
したがって、コンテキストで提供された情報に生成された回答を接地すること – つまり、生成されたテキストの証拠を提供することは、LLMSの信頼性にとって最も重要です。
この情報を提供することは、コンテキストの帰属のタスクです。
このホワイトペーパーでは、このタスクのLLMベースのアプローチ、つまり（i）ゼロショット推論、（ii）LLM Ensembling、および（iii）より大きなLLMによって生成された合成データ上の小さなLMSの微調整を調査します。
私たちの重要な貢献はSynqaです。コンテキストの帰属データを合成するための新しい生成戦略です。
選択されたコンテキスト文を考えると、LLMはこれらの文によってサポートされるQAペアを生成します。
これにより、テキスト生成におけるLLMSの自然な強みが、合成トレーニングデータの明確な帰属パスを確保します。
SYNQAを介して合成された属性データは、異なるQAタスクとドメインのコンテキスト属性のために小さなLMSを微調整するのに非常に効果的であることを示します。
最後に、ユーザー調査により、QAのコンテキスト属性における小さなLMS（SYNQAの合成データで微調整）の有用性を検証します。

要約(オリジナル)

Question Answering (QA) accounts for a significant portion of LLM usage ‘in the wild’. However, LLMs sometimes produce false or misleading responses, also known as ‘hallucinations’. Therefore, grounding the generated answers in contextually provided information — i.e., providing evidence for the generated text — is paramount for LLMs’ trustworthiness. Providing this information is the task of context attribution. In this paper, we systematically study LLM-based approaches for this task, namely we investigate (i) zero-shot inference, (ii) LLM ensembling, and (iii) fine-tuning of small LMs on synthetic data generated by larger LLMs. Our key contribution is SynQA: a novel generative strategy for synthesizing context attribution data. Given selected context sentences, an LLM generates QA pairs that are supported by these sentences. This leverages LLMs’ natural strengths in text generation while ensuring clear attribution paths in the synthetic training data. We show that the attribution data synthesized via SynQA is highly effective for fine-tuning small LMs for context attribution in different QA tasks and domains. Finally, with a user study, we validate the usefulness of small LMs (fine-tuned on synthetic data from SynQA) in context attribution for QA.

arxiv情報

著者	Gorjan Radevski,Kiril Gashteovski,Shahbaz Syed,Christopher Malon,Sebastien Nicolas,Chia-Chien Hung,Timo Sztyler,Verena Heußer,Wiem Ben Rim,Masafumi Enomoto,Kunihiro Takeoka,Masafumi Oyamada,Goran Glavaš,Carolin Lawrence
発行日	2025-06-16 16:22:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On Synthesizing Data for Context Attribution in Question Answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー