Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

要約

大規模な言語モデル（LLMS）における数学的推論能力の急速な進歩により、AIシステムは、学生の問題解決プロセスの理解をサポートするために、教育環境でますます採用されています。
ただし、重要なコンポーネントは、現在のLLM生成された説明：視覚的な説明では、既知のままです。
現実世界の指導的コンテキストでは、人間の家庭教師は、概念的な明確さを強化するために、図、マーク、ハイライトなどの視覚補助具を日常的に採用しています。
このギャップを埋めるために、視覚ソリューションの説明の新しいタスクを紹介します。これには、理解に不可欠な視覚要素（補助線、注釈、または幾何学的構造など）を組み込んだ説明を生成する必要があります。
このタスクでモデルのパフォーマンスを評価するために、視覚キーポイントとそれらの要素を参照する対応する説明テキストに注釈が付けられた997の数学の問題で構成されるマルチモーダルベンチマークであるMathExplainを提案します。
私たちの経験的結果は、一部のクローズドソースモデルは視覚ソリューションと爆発の有望な機能を実証しているが、特に関連する視覚コンポーネントを特定し、コヒーレントなキーポイントベースの説明を生成する際に、現在のオープンソースの汎用モデルが一貫して機能することを示しています。
視覚的ソリューションとexplainingとMathExexplainデータセットは、教育におけるマルチモーダルLLMに関するさらなる研究を触媒し、展開を効果的で説明指向のAIチューターとして促進することを期待しています。
コードとデータは公開されます。

要約(オリジナル)

With the rapid advancement of mathematical reasoning capabilities in Large Language Models (LLMs), AI systems are increasingly being adopted in educational settings to support students’ comprehension of problem-solving processes. However, a critical component remains underexplored in current LLM-generated explanations: visual explanation. In real-world instructional contexts, human tutors routinely employ visual aids – such as diagrams, markings, and highlights – to enhance conceptual clarity. To bridge this gap, we introduce a novel task of visual solution explanation, which requires generating explanations that incorporate newly introduced visual elements essential for understanding (e.g., auxiliary lines, annotations, or geometric constructions). To evaluate model performance on this task, we propose MathExplain, a multimodal benchmark consisting of 997 math problems annotated with visual keypoints and corresponding explanatory text that references those elements. Our empirical results show that while some closed-source models demonstrate promising capabilities on visual solution-explaining, current open-source general-purpose models perform inconsistently, particularly in identifying relevant visual components and producing coherent keypoint-based explanations. We expect that visual solution-explaining and the MathExplain dataset will catalyze further research on multimodal LLMs in education and advance their deployment as effective, explanation-oriented AI tutors. Code and data will be released publicly.

arxiv情報

著者	Jaewoo Park,Jungyang Park,Dongju Jang,Jiwan Chung,Byungwoo Yoo,Jaewoo Shin,Seonjoon Park,Taehyeong Kim,Youngjae Yu
発行日	2025-04-07 14:23:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー