Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

要約

生成 AI と大規模言語モデルは、プログラミング入門用の次世代教育テクノロジを強化することにより、コンピューティング教育を強化する上で大きな可能性を秘めています。
最近の研究では、プログラミング教育に関連するさまざまなシナリオに対してこれらのモデルが研究されています。
ただし、これらの作業は、通常、すでに古いモデルまたは特定のシナリオのみを考慮しているため、いくつかの理由で制限されています。
その結果、包括的なプログラミング教育シナリオの最先端モデルをベンチマークする体系的な研究が不足しています。
私たちの研究では、ChatGPT (GPT-3.5 ベース) と GPT-4 の 2 つのモデルを体系的に評価し、さまざまなシナリオで人間の家庭教師とそのパフォーマンスを比較しました。
Python プログラミングの 5 つの入門問題とオンラインプラットフォームからの実際のバグのあるプログラムを使用して評価し、専門家ベースの注釈を使用してパフォーマンスを評価します。
私たちの結果は、GPT-4 が ChatGPT (GPT-3.5 ベース) を大幅に上回り、いくつかのシナリオで人間の家庭教師のパフォーマンスに近づくことを示しています。
これらの結果は、GPT-4 が依然として苦戦している設定も強調しており、これらのモデルのパフォーマンスを向上させる技術の開発に関する刺激的な将来の方向性を提供します。

要約(オリジナル)

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios relevant to programming education; however, these works are limited for several reasons, as they typically consider already outdated models or only specific scenario(s). Consequently, there is a lack of a systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios. In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios. We evaluate using five introductory Python programming problems and real-world buggy programs from an online platform, and assess performance using expert-based annotations. Our results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors’ performance for several scenarios. These results also highlight settings where GPT-4 still struggles, providing exciting future directions on developing techniques to improve the performance of these models.

arxiv情報

著者	Tung Phung,Victor-Alexandru Pădurean,José Cambronero,Sumit Gulwani,Tobias Kohn,Rupak Majumdar,Adish Singla,Gustavo Soares
発行日	2023-06-29 17:57:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー