Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

要約

インテリジェント個別指導システム (ITS) には多くの場合、自動フィードバックコンポーネントが含まれており、生徒が事前定義されたエラーを検出すると、事前定義されたフィードバックメッセージが生徒に提供されます。
このようなフィードバックコンポーネントに対して、私たちはテンプレートベースのアプローチに頼ることがよくあります。
これらのアプローチでは、限られた数の生徒の可能性のある間違いを検出し、対応するフィードバックを提供するために人間の専門家による多大な労力が必要です。
この制限は、多数の異なる不正確なエラーが存在する可能性がある、自由形式の数学の問題で例として挙げられます。
私たちの研究では、テンプレートベースのアプローチを使用する確立された ITS の機能と同様に、自由形式の数学の質問に対するフィードバックを生成する大規模言語モデル (LLM) の機能を調べます。
実際の学生の反応とそれに対応する ITS が提供するフィードバックに基づいて、オープンソースと独自の LLM の両方を微調整します。
テキスト類似性メトリクスを使用して、生成されたフィードバックの品質を測定します。
オープンソースモデルと独自のモデルはどちらも、トレーニング中に見られるフィードバックを再現することには期待できますが、これまで目に見えなかった学生のエラーにはうまく一般化できないことがわかりました。
これらの結果は、LLM がフィードバックの形式を学習できるにもかかわらず、学生が犯した数学的間違いを完全に理解できないことを示唆しています。

要約(オリジナル)

Intelligent Tutoring Systems (ITSs) often contain an automated feedback component, which provides a predefined feedback message to students when they detect a predefined error. To such a feedback component, we often resort to template-based approaches. These approaches require significant effort from human experts to detect a limited number of possible student errors and provide corresponding feedback. This limitation is exemplified in open-ended math questions, where there can be a large number of different incorrect errors. In our work, we examine the capabilities of large language models (LLMs) to generate feedback for open-ended math questions, similar to that of an established ITS that uses a template-based approach. We fine-tune both open-source and proprietary LLMs on real student responses and corresponding ITS-provided feedback. We measure the quality of the generated feedback using text similarity metrics. We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors. These results suggest that despite being able to learn the formatting of feedback, LLMs are not able to fully understand mathematical errors made by students.

arxiv情報

著者	Hunter McNichols,Jaewook Lee,Stephen Fancsali,Steve Ritter,Andrew Lan
発行日	2024-05-10 11:53:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー