The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory

要約

高品質のテスト項目は、特にアイテム応答理論（IRT）内の教育評価に不可欠です。
従来の検証方法は、アイテムの難易度と差別を推定するために、リソース集約型のパイロットテストに依存しています。
最近では、項目執筆の欠陥（IWF）ルーブリックは、テキスト機能に基づいてテスト項目を評価するためのドメインジェネラルアプローチとして登場しました。
ただし、IRTパラメーターとの関係は未定のままです。
このギャップに対処するために、さまざまなSTEM被験者（たとえば、数学や生物学）にわたって7,000を超える多肢選択式の質問を含む研究を実施しました。
自動化されたアプローチを使用して、各質問に19基準IWFルーブリックを注釈し、データ駆動型のIRTパラメーターとの関係を研究しました。
私たちの分析により、特に生命と物理科学の領域におけるIWFの数とIRTの難易度と識別パラメーターの間の統計的に有意なリンクが明らかになりました。
さらに、特定のIWF基準がアイテムの品質にますます深刻な影響を与えることができることを観察しました（たとえば、否定的な言葉遣いと信じがたいディストラクタ）。
全体として、IWFはIRTパラメーター（特に低難しいMCQのスクリーニングのために）を予測するのに役立ちますが、従来のデータ駆動型検証方法を置き換えることはできません。
私たちの調査結果は、堅牢なアイテム検証のためにドメイン固有のコンテンツを理解するドメイン総評価ルーブリックとアルゴリズムに関するさらなる研究の必要性を強調しています。

要約(オリジナル)

High-quality test items are essential for educational assessments, particularly within Item Response Theory (IRT). Traditional validation methods rely on resource-intensive pilot testing to estimate item difficulty and discrimination. More recently, Item-Writing Flaw (IWF) rubrics emerged as a domain-general approach for evaluating test items based on textual features. However, their relationship to IRT parameters remains underexplored. To address this gap, we conducted a study involving over 7,000 multiple-choice questions across various STEM subjects (e.g., math and biology). Using an automated approach, we annotated each question with a 19-criteria IWF rubric and studied relationships to data-driven IRT parameters. Our analysis revealed statistically significant links between the number of IWFs and IRT difficulty and discrimination parameters, particularly in life and physical science domains. We further observed how specific IWF criteria can impact item quality more and less severely (e.g., negative wording vs. implausible distractors). Overall, while IWFs are useful for predicting IRT parameters–particularly for screening low-difficulty MCQs–they cannot replace traditional data-driven validation methods. Our findings highlight the need for further research on domain-general evaluation rubrics and algorithms that understand domain-specific content for robust item validation.

arxiv情報

著者	Robin Schmucker,Steven Moore
発行日	2025-03-13 16:47:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー