This paper primarily employs Item Response Theory (IRT) to estimate item characteristics and the proficiency levels of students as reflected in the exam results. The process includes the application of algorithms for item characteristic parameter estimation and the utilization of
...
This paper primarily employs Item Response Theory (IRT) to estimate item characteristics and the proficiency levels of students as reflected in the exam results. The process includes the application of algorithms for item characteristic parameter estimation and the utilization of statistical techniques for item selection from an item pool. Additionally, Classical Test Theory (CTT) is also used to gain insights into the item characteristics. The mathematical frameworks behind every algorithm and model will be introduced in detail. Our ultimate objective is to create an item bank that unifies all items from various exam versions onto a common scale. The technique to put items from different versions on the same scale is the test equating technique, and it will also be described in detail.
After a comprehensive analysis to determine the appropriate model within the framework of Item Response Theory (IRT), the Rasch model has been selected for all versions of the exams. One item from version 9 has been removed from the item pool due to its unsatisfactory item fit. Based on the Item-Map plot, the conclusion can be made that all three exam versions appear to be relatively easy for students to answer, as evidenced by the high bar of the exams. Ultimately, the item bank has been successfully established through the application of two test-equating methods, with results indicating its reliability.