‘Which LLM should I use?’: Evaluating LLMs for tasks performed by Undergraduate Computer Science Students

要約

本研究では、コンピュータサイエンスの学部生によく見られるタスクを実行する際の、様々な大規模言語モデル（LLM）の有効性を評価する。計算機教育コミュニティでは、様々なタスクにLLMを使用する可能性を探る研究が数多く行われているが、様々なLLMを比較し、タスクごとにどのLLMが最も効果的かを評価する包括的な研究は不足している。私たちの研究では、Google Bard、ChatGPT(3.5)、GitHub Copilot Chat、Microsoft Copilotなどの一般に公開されているLLMを、インドのコンピュータサイエンスの学部生がよく遭遇するさまざまなタスクについて体系的に評価しました。これらのタスクには、コードの説明や文書化、授業の課題の解決、技術面接の準備、新しい概念やフレームワークの学習、電子メールの作成などが含まれます。これらのタスクに対する評価は、最終学年の前学年と最終学年のコンピュータサイエンスの学部生によって実施され、モデルの長所と限界に関する洞察が得られました。本研究は、学生や指導者が特定のタスクに適したLLMを選択する際の指針となり、LLMを学生や指導者がどのように建設的に利用できるかについての貴重な洞察を提供することを目的としている。

要約(オリジナル)

This study evaluates the effectiveness of various large language models (LLMs) in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most effective for different tasks. Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot across diverse tasks commonly encountered by undergraduate computer science students in India. These tasks include code explanation and documentation, solving class assignments, technical interview preparation, learning new concepts and frameworks, and email writing. Evaluation for these tasks was carried out by pre-final year and final year undergraduate computer science students and provides insights into the models’ strengths and limitations. This study aims to guide students as well as instructors in selecting suitable LLMs for any specific task and offers valuable insights on how LLMs can be used constructively by students and instructors.

arxiv情報

著者	Vibhor Agarwal,Madhav Krishan Garg,Sahiti Dharmavaram,Dhruv Kumar
発行日	2024-04-03 14:19:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

‘Which LLM should I use?’: Evaluating LLMs for tasks performed by Undergraduate Computer Science Students

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー