AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning

要約

項目反応理論 (IRT) は、言語能力テストなどのコンピューター適応テスト (CAT) で広く使用されている、解釈可能な因子モデルのクラスです。
従来、これらは、受験者がテスト項目 (つまり、質問) に対して正しい答えを得る確率に関するパラメトリック混合効果モデルを使用して適合されます。
BertIRT などのこれらのモデルのニューラルネット拡張には、特殊なアーキテクチャとパラメーターの調整が必要です。
すぐに使用できる自動機械学習 (AutoML) ツールと互換性のある多段階フィッティング手順を提案します。
これは、2 ステージの内部ループを備えたモンテカルロ EM (MCEM) 外部ループに基づいており、アイテム特徴を使用してノンパラメトリック AutoML グレードモデルをトレーニングし、その後アイテム固有のパラメトリックモデルをトレーニングします。
これにより、テストのスコアリングのためのモデリングワークフローが大幅に高速化されます。
このテストを、一か八かのオンライン英語能力テストである Duolingo English Test に適用することで、その有効性を実証します。
結果として得られるモデルは、通常、既存の方法 (非説明的な IRT モデルや BERT-IRT などの説明的な IRT モデル) よりも適切に校正され、より優れた予測パフォーマンスが得られ、より正確なスコアが得られることを示します。
その過程で、CAT のアイテムパラメーターを調整するための機械学習方法の簡単な調査を提供します。

要約(オリジナル)

Item response theory (IRT) is a class of interpretable factor models that are widely used in computerized adaptive tests (CATs), such as language proficiency tests. Traditionally, these are fit using parametric mixed effects models on the probability of a test taker getting the correct answer to a test item (i.e., question). Neural net extensions of these models, such as BertIRT, require specialized architectures and parameter tuning. We propose a multistage fitting procedure that is compatible with out-of-the-box Automated Machine Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with a two stage inner loop, which trains a non-parametric AutoML grade model using item features followed by an item specific parametric model. This greatly accelerates the modeling workflow for scoring tests. We demonstrate its effectiveness by applying it to the Duolingo English Test, a high stakes, online English proficiency test. We show that the resulting model is typically more well calibrated, gets better predictive performance, and more accurate scores than existing methods (non-explanatory IRT models and explanatory IRT models like BERT-IRT). Along the way, we provide a brief survey of machine learning methods for calibration of item parameters for CATs.

arxiv情報

著者	James Sharpnack,Phoebe Mulcaire,Klinton Bicknell,Geoff LaFlair,Kevin Yancey
発行日	2024-09-13 13:36:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー