Reliability of the Analytic Rubric and Checklist for the Assessment of Story Writing Skills: G and Decision Study in Generalizability Theory

N. Bilge Uzun , Devrim Alici, Mehtap Aktas


APA 6th edition
Uzun*, N.B., Alici, D., & Aktas, M. (2019). Reliability of the Analytic Rubric and Checklist for the Assessment of Story Writing Skills: G and Decision Study in Generalizability Theory. European Journal of Educational Research, 8(1), 169-180. doi:10.12973/eu-jer.8.1.169

Harvard
Uzun* N.B., Alici D., and Aktas M. 2019 'Reliability of the Analytic Rubric and Checklist for the Assessment of Story Writing Skills: G and Decision Study in Generalizability Theory', European Journal of Educational Research , vol. 8, no. 1, pp. 169-180. Available from: http://dx.doi.org/10.12973/eu-jer.8.1.169

Chicago 16th edition
Uzun*, N. Bilge , Alici, Devrim and Aktas, Mehtap . "Reliability of the Analytic Rubric and Checklist for the Assessment of Story Writing Skills: G and Decision Study in Generalizability Theory". (2019)European Journal of Educational Research 8, no. 1(2019): 169-180. doi:10.12973/eu-jer.8.1.169

Abstract

The purpose of study is to examine the reliability of analytical rubrics and checklists developed for the assessment of story writing skills by means of generalizability theory. The study group consisted of 52 students attending the 5th grade at primary school and 20 raters in Mersin University. The G study was carried out with the fully crossed hxpxg (story x rater x performance task) design, where the scoring keys were determined as fix facet. Decision Study was carried out by changing the task facet conditions. As a result, it was observed in both scoring keys that the sources of variance related to the stories had a high variance percentage in the main effects while "hp (story and rater interaction effects)" a high variance percentage in the interaction effects. The highest variance in the design belongs to the interaction effect "hpg (story, rater and performance task interaction effects)". This can be an indicator for the existence of different sources of variability and error, which are not included in the design. Examining the G and phi coefficients calculated for both scoring keys, it was determined that scoring with analytic rubrics is more reliable and generalizable. According to the decision studies, it was decided that the number of tasks used in this study is to be most appropriate.

Keywords: Story writing skills, performance assessment, checklist, rubric, generalizability theory.


References

Aktas, M. (2013). An Investigation of the Reliability of the Scores Obtained Through Rating the Same Performance Task with Three Different Techniques by Different Numbers of Raters According to Generalizability Theory (Unpublished master’s thesis). Mersin University/Institute of Education Sciences, Mersin, Turkey.

Atilgan, H. (2004). A Research on The Comparability of Generalizability Theory and Multivariate Rasch Model (Unpublished doctorate thesis). Hacettepe University/ Institute of Social Sciences, Ankara, Turkey.

Bachman, L. F., Lynch, B. K. & Mason, M. (1995). Investigating Variability in Asks and Rater Judgements in a Performance Test of Foreign Language Speaking. Language Testing, 12, 238-257.

Branthwaite, A., Trueman, M., & Berrisford, T. (1981). Unreliability of Marking: Further Evidence and a Possible Explanation. Education Review, 33(1), 41-46.

Breland, H. M. (1983). The Direct Assessment of Writing Skill: A Measurement Review, College Board Report No. 83-6, ETS RR No. 83-32, New York: College Examination Board.

Brennan, R. L. (1992). Elements of Generalizability Theory (rev. ed.). Iowa City IA: ACT.

Brennan, R. L. (2001). Generalizability Theory. New York: Springer-Verlag.

Brookhart, S. M. (1999). The Art and Science of Classroom Assessment: the Missing Part of Pedagogy. Ashe-Eric Higher Education Report (Vol. 27, No.1). Washington, DC: The George Washington University, Graduate School of Education and Human Development.

Brown, G. T. (2010). The Validity of Examination Essays in Higher Education: Issues and Responses. Higher Education Quarterly, 64(3), 276-291.

Coffman, W.E. (1971). Essay Examinations. Educational Measurement (2nd ed.) R.L. Thorndike ed. Washington D.C.: American Council on Education.

Coffman, W.E. & Kurfman, D.A. (1968). A comparison of two methods of reading essay examinations. American Educational Research Journal, 1, 99-107.

Cooper, R. G. (1984). The Strategy-Performance Link in Product Innovation. R&D Management, 14: 247–259. doi:10.1111/j.1467-9310.1984.tb00521.

Cooper, P.L. (1984). The Assessment of Writing Ability: A Review of Research, New Jersey: GRE Board Research Report No: 82-15R

Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. (1972). The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley.

Cronbach, L.J., Nageswari, R., & Gleser, G.C. (1963). Theory of Generalizability: A Liberation of Reliability Theory. The British Journal of Statistical Psychology, 16, 137-163.

East, M. (2009). Assessing the Reliability of a Detailed Analytic Scoring Rubric for Foreign Language Writing. Assessing Writing, 14, 88–115.

Ene, E. & Kosobucki, V. (2016). Rubrics and Corrective Feedback in ESL Writing: A Longitudinal Case Study of an L2 Writer. Assessing Writing, 30, 3–20.

Ferrara, S. (1993). Generalizability theory and scaling: Their roles in writing assessment and implications for performance assessments in other content areas. Paper presented at the annual meeting of the National Council on Measurement in Education, Atlanta in G. Phillips (Moderator), After a Decade of Authentic Writing Assessment, What Advice Do Frontier States Have to Offer Authentic Assessment Developers in Other Subject Areas?

Guigu, M.R., Guigu, P.C. & Baldus, R. (2012). Utilizing generalizability theory to investigate the reliability of the grades assigned to undergraduate research papers. Journal of MultiDiciplinary Assessment, 8(19), 26-40.

Guler, N., Uyanik, K. G., & Teker, T. G. (2012). Genellenebilirlik Kurami. Ankara: PegemA Yayincilik..

Hamp-Lyons, L. (1991). Reconstructing “Academic Writing Proficiency”. In: L. Hamp-Lyons (Ed.), Assessing Second Language Writing in Academic Contexts (pp. 127–154). Norwood NJ: Ablex.

Han, T., & Ege, I. (2013). Using Generalizability Theory to Examine Classroom Instructors’ Analytic Assessment of EFL Writing. International Journal of Education, 5(3), 20-35.

Hyland, K. (2003). Second Language Writing. Cambridge: Cambridge University Press.

Janssen, G., Meier, V., & Trace, J. (2015). Building a Better Rubric: Mixed Methods Rubric Revision.  Assessing Writing, 26, 51–66.

Javed, M., Juan, W. X., & Nazli, S. (2013). A Study of Students’ Assessment in Writing Skills of the English Language. International Journal of Instruction, 6(2), 1308-1470.

Jonsson, A., & Svingby, G. (2007). The Use of Scoring Rubrics: Reliability, Validity and Educational Consequences. Educational Research Review, 2, 130–144.

Fraile, J., Panadero, E., & Pardo, R. (2017). Co-Creating Rubrics: The Effects on Self-Regulated Learning, Self-Efficacy and Performance of Establishing Assessment Criteria with Students. Studies in Educational Assessment, 53, 69–76.

Kara, Y., & Kelecioglu, H. (2015). Investigation the Effects of the Raters’ Qualifications on Determining Cutoff Scores with Generalizability Theory. Journal of Measurement and Evaluation in Education and Psychology, 6(1), 58-71.

Kellogg, R. T. (2008). Training Writing Skills: A Cognitive Developmental Perspective. Journal of Writing Research, 1(1), 1-26.

Kondo-Brown, K. (2002). A Facets Analysis of Rater Bias in Measuring Japanese L2 Writing Performance. Language Testing, 19 (1), 3–31.

Lee, Y.-W. , Kantor, R., & Mollaun, P. 2002: Score Reliability as Essential Prerequisite for Validating New Writing and Speaking Tasks for TOEFL. Paper presented at the annual meeting of Teachers of English to the Speakers of Other Languages (TESOL). Salt Lake City, UT, April 2002.

Lloyd-Jones, R. (1977). Primary trait scoring. In C.R. Cooper & L. Odell (Eds.), Evaluating Writing: Describing, Measuring, Judging (pp. 33-66). Urbana, IL: National Council of Teachers of English. 

Matt, G. (2003, November 10). Generalizability Theory, Retrieved from http://www.psychology.sdsu.edu/faculty/matt/Pubs/GThtml/GTheory_GEMatt.html

McColly, W. (1970). What does educational research say about the judging of writing ability? The Journal of Educational Research, 64, 148-156.

Moskal, Barbara M. & JonA. Leydens (2000). Scoring Rubric Development: Validity and Reliability. Practical Assessment, Research & Assessment, 7(10). Available online: http://PAREonline.net/getvn.asp?v=7&n=10

National Commission on Writing in America's Schools and Colleges. (2003). The Neglected "R": The Need for a Writing Revolution. New York, NY: College Board.

Nunnally, J.C., & Bernstein, I.H. (1994) (3rd ed.). Psychometric Theory. New York: McGraw Hill.

Veal, L.R., & Hudson, S.A. (1983). Direct and Indirect Measures for Large-Scale Assessment of Writing. Research in the Teaching of English, 17(3), 290-296.

Petersen, W. (1999). 50 French Oral Communication Activities with Mini-Rubrics. Auburn Hills, MI: Teacher's Discovery.

Popham, W. J. (1990). Modern Educational Measurement: A Practitioner’s Perspective (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Rezaei, A.R. & Lovorn, M. (2010). Reliability and Validity of Rubrics for Assessment Through Writing. Assessing Writing, 15, 18–39.

Scott, Virgina M. (1996). Rethinking Foreign Language Writing. Boston, MA: Heinle & Heinle.

Shavelson, R. J, & Webb, N. M. (1981). Generalizability Theory: 1973-1980. British Journal of Mathematical and Statistical Psychology, 34, 133-166.

Shavelson, R. J, Rowley, G.L. & Webb, N. M. (1989). Generalizability Theory. American Psychology, 44, 922-932.

Shavelson, J. R., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park. CA: Sage.

Shohamy, E., Gordon, C.M., & Kraemer, R. (1992). The Effect of Raters’ Background and Training on the Reliability of Direct Writing Tests. The Modern Language Journal, 76, 27-33. http://dx.doi.org/10.1111/j.1540-4781.1992.tb02574.x

Speck, B. W., & Jones, T. R. (1998). Direction in the Grading of Writing. In F. Zak & C. Weaver (Eds.), The theory and practice of grading writing (pp. 17-29). Albany, NY State University.

Stiggins, R.J. (1982). A Comparison of Direct and Indirect Writing Assessment Methods. Research in the Teaching of English, 16(2), 101-114.

Struthers, L., Lapadat, J. C. & MacMillan, P.D. (2013). Assessing Cohesion in Children’s Writing: Development of a Checklist. Assessing Writing, 18, 187–201.

Tedick, D. J. (2002). Proficiency-oriented language instruction and assessment: Standards, philosophies, and considerations for assessment. In Minnesota Articulation Project, D. J. Tedick (Ed.), Proficiency-oriented language instruction and assessment: A curriculum handbook for teachers (Rev Ed.). CARLA Working Paper Series. Minneapolis, MN: University of Minnesota, The Center for Advanced Research on Language Acquisition.

Tobar, D.A., Stegner, J. & Kane M.T. (1999). The Use of Generalizability Theory in Examining the Dependability of Scores on the Profile of Mood States. Measurement In Physical Education And Exercise Science, 3(3), 141-156.

Wind, S.A., Stager, C., & Patil, Y.J. (2017). Exploring the Relationship Between Textual Characteristics and Rating Quality in Rater-Mediated Writing Assessments: An Illustration with L1 and L2 Writing Assessments. Assessing Writing, 34, 1-15.

Yamamoto M., Umemura N., & Kawano H. (2018) Automated Essay Scoring System Based on Rubric. In: Lee R. (eds) Applied Computing & Information Technology. ACIT 2017. Studies in Computational Intelligence, vol 727. Springer, Cham.