Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

要約

大規模言語モデル(LLM)における数学的推論能力の急速な進歩に伴い、AIシステムは、生徒の問題解決プロセスの理解を支援するために、教育現場でますます採用されるようになってきている。しかし、現在のLLMが生成する説明には、視覚的説明という重要な要素がまだ十分に検討されていない。実際の教育現場では、人間のチューターが、概念的な明瞭性を高めるために、図、マーキング、ハイライトなどの視覚的補助を日常的に用いている。このギャップを埋めるために、問題を解くだけでなく、理解に不可欠な新しく導入された視覚的要素（補助線、注釈、幾何学的構造など）を組み込んだ説明を生成する必要がある、視覚的解法説明という新しいタスクを導入する。このタスクにおけるモデルの性能を評価するために、我々はMathExplainを提案する。MathExplainは、視覚的キーポイントと、それらの要素を参照する対応する説明テキストで注釈された997の数学問題からなるマルチモーダルベンチマークである。我々の実証結果は、いくつかのクローズドソースモデルが視覚的解法説明において有望な能力を示す一方で、現在のオープンソースの汎用モデルは、特に関連する視覚的要素を識別し、首尾一貫したキーポイントに基づく説明を生成することにおいて、一貫性のないパフォーマンスを示すことを示している。我々は、視覚的解法説明とMathExplainデータセットが、教育におけるマルチモーダルLLMのさらなる研究を促進し、効果的な説明指向AIチューターとしての展開を進めることを期待している。コードとデータは公開される予定である。

要約(オリジナル)

With the rapid advancement of mathematical reasoning capabilities in large language models (LLMs), AI systems are increasingly being adopted in educational settings to support students’ comprehension of problem-solving processes. However, a critical component remains underexplored in current LLM-generated explanations: visual explanation. In real-world instructional contexts, human tutors routinely employ visual aids-such as diagrams, markings, and highlights-to enhance conceptual clarity. To bridge this gap, we introduce a novel task of visual solution explanation, which requires not only solving problems but also generating explanations that incorporate newly introduced visual elements essential for understanding (e.g., auxiliary lines, annotations, or geometric constructions). To evaluate model performance on this task, we propose MathExplain, a multimodal benchmark consisting of 997 math problems annotated with visual keypoints and corresponding explanatory text that references those elements. Our empirical results show that while some closed-source models demonstrate promising capabilities on visual solution-explaining, current open-source general-purpose models perform inconsistently, particularly in identifying relevant visual components and producing coherent keypoint-based explanations. We expect that visual solution-explaining and the MathExplain dataset will catalyze further research on multimodal LLMs in education and advance their deployment as effective, explanation-oriented AI tutors. Code and data will be released publicly.

arxiv情報

著者	Jaewoo Park,Jungyang Park,Dongju Jang,Jiwan Chung,Byungwoo Yoo,Jaewoo Shin,Seonjoon Park,Taehyeong Kim,Youngjae Yu
発行日	2025-04-04 06:03:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー