MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering

要約

テキストニュースとタイムシリーズの進化の関係を理解することは、応用データサイエンスにおける重要でありながら経験不足の課題です。
マルチモーダル学習は牽引力を獲得していますが、既存のマルチモーダルの時系列データセットは、物語の情報と時間的パターンの間の複雑な相互作用をキャプチャするために不可欠なクロスモーダル推論と複雑な質問応答の評価に不足しています。
このギャップを埋めるために、マルチモーダル時系列ベンチマーク（MTBench）を紹介します。これは、時系列と財務ドメイン全体のテキスト理解で大規模な言語モデル（LLM）を評価するために設計された大規模なベンチマークです。
Mtbenchは、対応する株価の動きと歴史的な温度記録に沿った天気レポートを含む金融ニュースを含む、ペアの時系列とテキストデータで構成されています。
孤立したモダリティに焦点を当てた既存のベンチマークとは異なり、Mtbenchは、構造化された数値傾向と構造化されていないテキストの物語を共同で推論するための包括的なテストベッドを提供します。
MTBenchの豊富さにより、テキストと時系列の両方のデータを深く理解する必要がある多様なタスクの策定により、時系列予測、セマンティックおよびテクニカルトレンド分析、ニュース主導の質問（QA）が含まれます。
これらのタスクは、時間的依存関係をキャプチャし、テキストコンテキストから重要な洞察を抽出し、クロスモーダル情報を統合するモデルの能力を対象としています。
Mtbenchで最先端のLLMを評価し、ニュースの物語と時間的パターンの複雑な関係をモデル化する際の有効性を分析します。
私たちの調査結果は、長期的な依存関係を捉え、財政的傾向や天気の傾向における因果関係の解釈、マルチモーダル情報の効果的な融合の難しさなど、現在のモデルの重要な課題を明らかにしています。

要約(オリジナル)

Understanding the relationship between textual news and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering, which are essential for capturing complex interactions between narrative information and temporal patterns. To bridge this gap, we introduce Multimodal Time Series Benchmark (MTBench), a large-scale benchmark designed to evaluate large language models (LLMs) on time series and text understanding across financial and weather domains. MTbench comprises paired time series and textual data, including financial news with corresponding stock price movements and weather reports aligned with historical temperature records. Unlike existing benchmarks that focus on isolated modalities, MTbench provides a comprehensive testbed for models to jointly reason over structured numerical trends and unstructured textual narratives. The richness of MTbench enables formulation of diverse tasks that require a deep understanding of both text and time-series data, including time-series forecasting, semantic and technical trend analysis, and news-driven question answering (QA). These tasks target the model’s ability to capture temporal dependencies, extract key insights from textual context, and integrate cross-modal information. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns. Our findings reveal significant challenges in current models, including difficulties in capturing long-term dependencies, interpreting causality in financial and weather trends, and effectively fusing multimodal information.

arxiv情報

著者	Jialin Chen,Aosong Feng,Ziyu Zhao,Juan Garza,Gaukhar Nurbek,Cheng Qin,Ali Maatouk,Leandros Tassiulas,Yifeng Gao,Rex Ying
発行日	2025-03-21 05:04:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー