PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation

要約

将来のイベントの予測は、人工知能の究極の願望の1つとして存在します。
大規模な言語モデル（LLM）ベースのシステムにおける最近の進歩は、将来のイベントを予測する際に顕著な可能性を示しており、それによって研究コミュニティに大きな関心を集めています。
現在、イベントの予測を検索の高等世代（RAG）および推論タスクとして形式化することにより、予測機能を評価するためにいくつかのベンチマークが確立されています。
これらのベンチマークでは、各予測質問に関連する検索されたニュース記事で回答されます。
ただし、質問が有効または十分なサポートの理論的根拠によってサポートできるかどうかについては考慮がないため、これらのベンチマークの質問のいくつかは本質的に無関係である可能性があります。
この問題に対処するために、新しいベンチマークである預言者を紹介します。これは、検索のために関連するニュースと組み合わせた推定可能な予測質問を含むものです。
ベンチマークの推論を確保するために、因果関係の推論を通じて推論を評価する統計的尺度である因果介入尤度（CIL）を提案します。
このベンチマークを構築する際に、最初に最近のトレンド予測の質問を収集し、CILを使用してデータをフィルタリングし、イベント予測のための推定ベンチマークになりました。
広範な実験を通じて、最初にCILの妥当性と、CILの助けを借りてイベント予測に対する詳細な調査を実証します。
その後、預言者に関するいくつかの代表的な予測システムを評価し、将来の方向性について貴重な洞察を描きます。

要約(オリジナル)

Predicting future events stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events, thereby garnering significant interest in the research community. Currently, several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task. In these benchmarks, each prediction question is answered with relevant retrieved news articles. However, because there is no consideration on whether the questions can be supported by valid or sufficient supporting rationales, some of the questions in these benchmarks may be inherently noninferable. To address this issue, we introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval. To ensure the inferability of the benchmark, we propose Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference. In constructing this benchmark, we first collected recent trend forecasting questions and then filtered the data using CIL, resulting in an inferable benchmark for event prediction. Through extensive experiments, we first demonstrate the validity of CIL and in-depth investigations into event prediction with the aid of CIL. Subsequently, we evaluate several representative prediction systems on PROPHET, drawing valuable insights for future directions.

arxiv情報

著者	Zhengwei Tao,Zhi Jin,Bincheng Li,Xiaoying Bai,Haiyan Zhao,Chengfeng Dou,Xiancai Chen,Jia Li,Linyu Li,Chongyang Tao
発行日	2025-04-02 08:57:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー