Research on teachers’ grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In this study, a methodology for comparing two models of grading in terms of (a) agreement between assessors (reliability) and (b) justifications for the grades assigned (validity) was used with a small sample of teachers (n = 24). The design is experimental, with teachers being randomly assigned to two conditions, where they graded the same student performance using either an analytic or a holistic approach. Grades have been compared in terms of agreement and rank correlation, and justifications have been analyzed with content analysis. Findings suggest that the analytic condition yields substantively higher agreement among assessors as compared to the holistic condition (66 versus 46 percent agreement; Cohen's kappa .60 versus .41), as well as higher rank correlation (Spearman's rho .97 versus .94), without any major differences in how the grades were justified. On the contrary, there was a relatively strong consensus among most raters in the sample.