Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024

要約

ウェブ上で偽情報を事実から分離することは、人間の検索能力と推論能力の両方に長い間挑戦してきました。
私たちは、大規模言語モデル (LLM) の推論能力と最新の検索エンジンの検索能力を組み合わせて、このプロセスを自動化し、主張を説明可能な形で検証できることを示します。
LLM を統合し、マルチホップ証拠追跡戦略に基づいて検索します。
この戦略では、シーケンスツーシーケンスモデルを使用して入力クレームに基づいて最初の質問を生成し、質問に対する回答を検索して定式化し、LLM を使用して不足している証拠を追求するためのフォローアップの質問を繰り返し生成します。
FEVER 2024 (AVeriTeC) 共有タスクでシステムをデモンストレーションします。
すべての質問を一度に生成する戦略と比較して、私たちの方法では、0.045 高いラベル精度と 0.155 高い AVeriTeC スコア (証拠の適切性の評価) が得られます。
アブレーションを通じて、質問の生成方法、中程度のコンテキスト、一度に 1 つのドキュメントを使用した推論、メタデータの追加、言い換え、問題を 2 つのクラスに減らす、最終的な判断の再検討など、さまざまな設計上の選択の重要性を示します。
私たちが提出したシステムは、開発セットで .510 AVeriTeC スコア、テストセットで .477 AVeriTeC スコアを達成しました。

要約(オリジナル)

Separating disinformation from fact on the web has long challenged both the search and the reasoning powers of humans. We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process and explainably verify claims. We integrate LLMs and search under a multi-hop evidence pursuit strategy. This strategy generates an initial question based on an input claim using a sequence to sequence model, searches and formulates an answer to the question, and iteratively generates follow-up questions to pursue the evidence that is missing using an LLM. We demonstrate our system on the FEVER 2024 (AVeriTeC) shared task. Compared to a strategy of generating all the questions at once, our method obtains .045 higher label accuracy and .155 higher AVeriTeC score (evaluating the adequacy of the evidence). Through ablations, we show the importance of various design choices, such as the question generation method, medium-sized context, reasoning with one document at a time, adding metadata, paraphrasing, reducing the problem to two classes, and reconsidering the final verdict. Our submitted system achieves .510 AVeriTeC score on the dev set and .477 AVeriTeC score on the test set.

arxiv情報

著者	Christopher Malon
発行日	2024-11-08 18:25:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー