Implementation of the Omega (ω) Index to Detect Large-Scale Systematic Cheating
Cheating detection is an important issue in standardized testing, especially in large-scale settings. Statistical approaches are often computationally.
- Pub. date: October 15, 2019
- Pages: 1307-1322
- 519 Downloads
- 1171 Views
- 0 Citations
- #Answer-copying indices
- # item response theory
- # PIRLS
- # cheating detection
- # standardized testing
- # test integrity.
Cheating detection is an important issue in standardized testing, especially in large-scale settings. Statistical approaches are often computationally intensive and require specialised software to conduct. We present a two-stage approach that quickly filters suspected groups using statistical testing on an IRT-based answer-copying index. We also present an approach to mitigate data contamination and improve the performance of the index. The computation of the index was implemented through a modified version of an open source R package, thus enabling wider access to the method. Using data from PIRLS 2011 (N=64,232) we conduct a simulation to demonstrate our approach. Type I error was well-controlled and no control group was falsely flagged for cheating, while 16 (combined n=12,569) of the 18 (combined n=14,149) simulated groups were detected. Implications for system-level cheating detection and further improvements of the approach were discussed.
answer copying indices item response theory pirls cheating detection standardized testing test integrity
Keywords: Answer-copying indices, item response theory, PIRLS, cheating detection, standardized testing, test integrity.
References
Benbow, J., Mizrachi, A., Oliver, D., & Said-Moshiro, L. (2007). Large class sizes in the developing world: What do we know and what can we do. Washington, DC: American Institute for Research.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29.51.
Bock, R.D., & Aitkin, M. (1981). Marginal Maximum Likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459.
British Broadcasting Corporation. (2015). India students caught 'cheating' in exams in Bihar. Retrieved from http://www.bbc.com/news/world-asia-india-31960557.
Chalmers R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
Chajewski, M., Kim, Y., Antal, J. & Sweeney, K. (2014). Macro level systems of statistical evidence indicative of cheating. In Kingston, N. M., & Clark, A. K. (Eds.). Test fraud: Statistical detection and methodology. New York, NY: Routledge.
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39(1), 1-38.
Holland, P. W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Tech. Rep. No. 96–4). Princeton, NJ: Educational Testing Service.
International Association for the Evaluation of Educational Achievement. (2012). PIRLS 2011. Boston, MA: TIMSS & PIRLS International Study Center.
Levine, M. & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical Statistical Psychology, 35, 42-56.
Levine, M. & Rubin, D. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269-290.
Martin, M.O. & Mullis, I.V.S. (Eds.). (2012). Methods and procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center.
Mislevy, R.J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177–196.
Mislevy, R.J., Johnson, E.G. & Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17(2), 131–154.
Muraki, E. (1992). A Generalized Partial Credit Model: Application of an EM Algorithm. Applied Psychological Measurement, 16(2), 159-176.
R Core Team. (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Retrieved from http://www.R-project.org/
Romero, M., Riascos, A., & Jara, D. (2015). On the Optimality of Answer-Copying Indices: Theory and Practice. Journal of Educational and Behavioral Statistics, 40(5), 435–453.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: John Wiley & Sons.
Sunbul, O., & Yormaz, S. (2018). Effects of Test Level Discrimination and Difficulty on Answer-Copying Indices. International Journal of Evaluation and Research in Education, 7(1), 32-38.
Sotaridona, L. S., & Meijer, R. R. (2002). Statistical properties of the K-index for detecting answer copying in a multiple-choice test. Journal of Educational Measurement, 39(2), 115–132.
Sotaridona, L.S., & Meijer, R.R. (2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40(1), 53-69.
van der Linden, W.J., & Sotaridona, L.S.(2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31(3), 283-304.
Wollack, J.A.(1997). A nominal response model approach to detect answer copying. Applied Psychological Measurement, 21, 307–320.
Wollack, J.A.(2004). Detecting answer copying on high-stakes tests. The Bar Examiner, 73(2), 35-45.
Wollack, J. A., & Cohen, A. S. (1998). Detection of answer copying with unknown item and trait parameters. Applied Psychological Measurement, 22(2), 144-152.
Zopluoglu, C., & Davenport Jr, E. C. (2012). The empirical power and type I error rates of the GBT and ω indices in detecting answer copying on multiple-choice tests. Educational and Psychological Measurement, 72(6), 975-1000.
Zopluoglu, C. (2013). CopyDetect: An R package for computing statistical indices to detect answer copying on multiple-choice examinations. Applied Psychological Measurement, 37(1), 93-95.