Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

要約

複数の言語と主題カテゴリにまたがる多様な物理学コンセプトインベントリのセットを使用して、大規模な言語モデルベースの人工知能（AI）システムGPT-4Oの多言語およびマルチモーダルパフォーマンスを調査します。
PhysportのWebサイトから調達されたインベントリは、メカニック、電磁気、光学、熱力学などの古典的な物理学のトピック、および相対性、量子力学、天文学、数学、および実験室スキルをカバーしています。
以前のテキストのみの研究とは異なり、インベントリを画像としてアップロードして、学生が紙に見えるものを反映し、それによってシステムのマルチモーダル機能を評価しました。
我々の結果は、被験者間のパフォーマンスの変動を示しており、実験室のスキルは最も弱いと際立っています。
また、言語間の違いも観察され、英語とヨーロッパの言語は最も強いパフォーマンスを示しています。
特に、在庫アイテムの相対的な難易度は、調査の言語に大きく依存していません。
AIの結果を学生のパフォーマンスに関する既存の文献と比較すると、AIシステムは、実験室のスキルを除くすべての科目カテゴリの平均的な導入後の学部生を上回ることがわかります。
さらに、AIは、純粋にテキストベースのものよりも、画像の視覚的な解釈を必要とするアイテムで悪化します。

要約(オリジナル)

We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-only studies, we uploaded the inventories as images to reflect what a student would see on paper, thereby assessing the system’s multimodal functionality. Our results indicate variation in performance across subjects, with laboratory skills standing out as the weakest. We also observe differences across languages, with English and European languages showing the strongest performance. Notably, the relative difficulty of an inventory item is largely independent of the language of the survey. When comparing AI results to existing literature on student performance, we find that the AI system outperforms average post-instruction undergraduate students in all subject categories except laboratory skills. Furthermore, the AI performs worse on items requiring visual interpretation of images than on those that are purely text-based.

arxiv情報

著者	Gerd Kortemeyer,Marina Babayeva,Giulia Polverini,Ralf Widenhorn,Bor Gregorcic
発行日	2025-04-01 10:02:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー