Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

要約

知的チュータリングシステムやオンライン学習プラットフォームにおいて、大規模言語モデル（LLM）を用いてフィードバックを自動生成することは、多くの学生の学習成果を向上させる可能性を秘めている。特に数学のような科目では、問題、解法、生徒のエラーの場所を理解するモデルを必要とする。フィードバックはまた、他の望ましい特徴の中でも、起こりうる誤解を説明したり、生徒を励ましたりするような、効果的な個人指導戦略を反映するために、教育学的に妥当でなければならない。本研究では、正しさと整合性の両方を考慮しながらフィードバックを自動的に生成し、評価するという両方の問題に取り組む。第一に、数学のフィードバックを評価するためのルーブリックを提案し、GPT-4がそれを効果的に用いて、人間が書いたフィードバックとLLMが生成したフィードバックに注釈を付けることができることを示す。第二に、強化学習（RL）を用いて正しさと整合を最適化するフィードバック生成のフレームワークを提案する。具体的には、GPT-4の注釈を用いて、直接選好最適化(DPO)による学習のための拡張データセットにおいて、フィードバックのペアに対する選好を作成する。オープンソースのLLMであるLlama 2を用いて、我々の手法が生成されたフィードバックの正しさとアライメントを大幅に向上させることを示す。

要約(オリジナル)

Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student’s error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and LLM-generated feedback. Second, we propose a framework for feedback generation that optimizes both correctness and alignment using reinforcement learning (RL). Specifically, we use GPT-4’s annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO). We show that our methods significantly increase the correctness and alignment of generated feedback with Llama 2, an open-source LLM, qualitatively analyze our generation and evaluation systems using case studies, and outline several areas for future work.

arxiv情報

著者	Alexander Scarlatos,Digory Smith,Simon Woodhead,Andrew Lan
発行日	2024-03-02 20:25:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー