Document Type : Research Paper


Department of Foreign Languages and Linguistics, Shiraz University


Educators often employ various training techniques to reduce raters’ subjectivity. Negotiation is a technique which can assist novice raters to co-construct a shared understanding of the writing assessment when rating collaboratively. There is little research, however, on rating behaviors of novice raters while employing negotiation techniques and the effect of negotiation on their understanding of writing and rubric features. This study uses a qualitative method to keep track of 11 novice raters’ scoring behaviors and examine their textual foci during three phases of scoring through an analytic rubric: pre-negotiation, negotiation, and post-negotiation. To ensure triangulation, multiple sources of data including raters’ verbal protocols of independent scoring during the initial and final phases, audio-recorded interactions in the negotiation phase, and semi-structured interviews were gathered and analyzed. Results indicated that in their initial independent rating, raters mostly scored based on their understanding of the writing skill and the writing features that were important to them, but negotiation sessions aided them to refine their judgments and attend to a wider array of textual features more consistently and in line with the rubric, thereby expanding their understanding of the rubric categories. Post-negotiation ratings were also more similar to negotiation than prenegotiation ratings, meaning that the raters attended to more features of the rubric for scoring. The findings may have implications for rater training. In the absence of expert raters to train novice raters, negotiation can be considered a useful technique to improve raters’ understanding of the rubric features.


Ahmadi, A. (2019). A Study of Raters’ Behavior in Scoring L2 Speaking Performance: Using Rater Discussion as a Training Tool. Issues in Language Teaching, 8(1), 195-224.

Ahmadi, A., & Sadeghi, E. (2016). Assessing English language learners’ oral performance: A comparison of monologue, interview, and group oral test. Language Assessment Quarterly, 13, 341–358.

Ary, D., Jacobs, L. C., Irvine, C. K. S., & Walker, D. (2018). Introduction to research in education. Cengage Learning.

Attali, Y. (2016). A comparison of newly-trained and experienced raters on a standardized writing assessment. Language Testing, 33(1), 99-115.

Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12(2), 86-107.

Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience, Language Assessment Quarterly, 7(1), 54-74.

Broad, B. (1997). Reciprocal authorities in communal writing assessment: Constructing textual value within a “New politics of inquiry”. Assessing Writing, 4(2), 133-167.

Broad, B. (2003). What we really value: Beyond rubrics in teaching and assessing writing. Utah: Utah State University Press.

Cambridge University Press. (2015). Cambridge IELTS 10: Authentic examination papers from Cambridge ESOL. New York, NY: Cambridge University Press.

Cambridge University Press. (2016). Cambridge IELTS 11: Authentic examination papers from Cambridge ESOL. New York, NY: Cambridge University Press.

Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18(1), 65-81.

Clauser, B. E., Clyman, S. G., & Swanson, D. B. (1999). Components of rater error in a complex performance assessment. Journal of Educational Measurement, 36(1), 29-45.

Corbin, J., & Strauss, A. (2014). Basics of qualitative research: Techniques and procedures for developing grounded theory (4th ed.). SAGE.

Crismore, A., Markkanen, R., & Steffensen, M. S. (1993). Metadiscourse in persuasive writing: A study of texts written by American and Finnish university students. Written Communication, 10(1), 39-71.

Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing, 7(1), 31–51.

Cumming, A., Kantor, R., & Powers, D. (2001). Scoring TOEFL essays and TOEFL 2000 prototype tasks: An investigation into raters’ decision making, and development of a preliminary analytic framework. TOEFL Monograph Series, Report No. 22.

Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. Modern Language Journal, 86 (1), 67–96.

Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117-135.

Ducasse, A., & Brown, A. (2009). Assessing paired orals: Rater’s orientation to interaction. Language Testing, 26(3), 423–443.

Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 25(2), 155-185.

Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270-292.

In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341-366.

Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135-159.

Johnson, R. L., Penny, J., Gordon, B., Shumate, S. R., & Fisher, S. P. (2005). Resolving score differences in the rating of writing samples: Does discussion improve the accuracy of scores? Language Assessment Quarterly: An International Journal, 2(2), 117-146.

Jølle, L. (2014). Pair assessment of pupil writing: A dialogic approach for studying the development of rater competence. Assessing Writing, 20, 37–52.

Kim, H. J. (2015). A qualitative analysis of rater behavior on an L2 speaking assessment. Language Assessment Quarterly, 12(3), 239-261.

Kim, S., & Lee, H. K. (2015). Exploring rater behaviors during a writing assessment discussion. English Teaching, 70(1).

Lim, J. (2019). An investigation of the text features of discrepantly-scored ESL essays: A mixed-methods study. Assessing Writing, 39, 1-13.

Lindhardsen, V. (2018). From independent ratings to communal ratings: A study of CWA raters’ decision-making behaviors. Assessing Writing, 35, 12-25.

         Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.

Lumley, T. (2005). Assessing second language writing: The rater’s perspective. Frankfurt am Main: Peter Lang.

Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71.

May, L. (2009). Co-constructed interaction in a paired speaking test: The rater's perspective. Language Testing26(3), 397-421.

Moss, P., Schutz, A., & Collins, K. (1998). An integrative approach to portfolio evaluation for teacher licensure. Journal of Personnel Evaluation in Education, 12(2), 139–161.

Papajohn, D. (2002). Concept mapping for rater training. TESOL Quarterly, 36(2), 219–233.

Sakyi, A. A. (2003). A study of the holistic scoring behaviors of experienced and novice ESL instructors [Unpublished doctoral dissertation]. The University of Toronto.

Smith, D. (2000). Rater judgments in the direct assessment of competency-based second language writing ability. Studies in immigrant English language assessment, 1, 159-189.

Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3-22.

Vaughan, C. (1991). Holistic assessment: What goes on in the rater’s mind? In L. Hamp Lyons (Ed.). Assessing second language writing in academic contexts (pp.111–125). Ablex.