Let’s CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition

要約

自然言語推論（NLI）は、文のペアが誘惑、矛盾、または中立的な関係を表すかどうかを判断するタスクです。
NLIモデルは多くの推論タスクでうまく機能しますが、微調整された実用的な推論、特に条件の前提を処理する能力は、採用されていないままです。
この研究では、NLIモデルが条件文の推論をどのように処理するかを評価するために設計された新しいデータセットであるConferを紹介します。
2つの事前に訓練されたモデルを含む4つのNLIモデルのパフォーマンスを評価して、条件付き推論への一般化を調べます。
さらに、GPT-4O、LLAMA、GEMMA、DeepSeek-R1を含む大規模な言語モデル（LLMS）をゼロショットおよび少数のショットプロンプト設定で評価して、事前のコンテキストの場合となしで前提条件を推測する能力を分析します。
我々の調査結果は、NLIモデルが条件の前提条件の推論と格闘しており、既存のNLIデータセットでの微調整が必ずしもパフォーマンスを改善しないことを示しています。

要約(オリジナル)

Natural Language Inference (NLI) is the task of determining whether a sentence pair represents entailment, contradiction, or a neutral relationship. While NLI models perform well on many inference tasks, their ability to handle fine-grained pragmatic inferences, particularly presupposition in conditionals, remains underexplored. In this study, we introduce CONFER, a novel dataset designed to evaluate how NLI models process inference in conditional sentences. We assess the performance of four NLI models, including two pre-trained models, to examine their generalization to conditional reasoning. Additionally, we evaluate Large Language Models (LLMs), including GPT-4o, LLaMA, Gemma, and DeepSeek-R1, in zero-shot and few-shot prompting settings to analyze their ability to infer presuppositions with and without prior context. Our findings indicate that NLI models struggle with presuppositional reasoning in conditionals, and fine-tuning on existing NLI datasets does not necessarily improve their performance.

arxiv情報

著者	Tara Azin,Daniel Dumitrescu,Diana Inkpen,Raj Singh
発行日	2025-06-06 14:42:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Let’s CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー