This report was commissioned from the University of Bristol by the National Assessment Agency (NAA). The research question investigated in this paper was identified by NAA and related to whether markers’ severity in scoring changed over the marking period in quality assurance data provided by the marking agency. The focus was Key Stage 3 English Writing. As such, this research was not a designed study; instead it was an analysis of existing operational data.The operational quality assurance data came from a new process involving entering marks on-line, which was introduced to the national curriculum tests in 2008. Differences between markers’ scores and the ‘true’ marks generated by the expert panel were used to measure marking quality. Criteria for acceptable marking was set using deviations for each item and tolerances across a number of items. Only marks from pupils’ work included in the quality assurance checks were available for analysis. Live marking data was not available.Multilevel models were constructed to investigate average marking severity in initial marking checks following the training (standardisation) process. Changes in severity over subsequent checks (benchmarking) were also modelled. The longer and shorter writing tasks were modelled separately to accommodate different effects for each task. Higher ranking, experienced and new markers were also modelled separately, which allowed an exploration of different effects for these types of markers. Shortcomings in the data also warranted the estimation of separate models. On average, the differences between markers’ and the true marks were small at the first marking check and were generally not significantly different from zero. The exceptions to this were that higher ranking markers over-marked by 0.8 of a mark, on average, for the longer writing task and experienced and new markers under-marked by 0.4 and 0.9 of a mark, on average respectively, for the shorter writing task. Changes in the average level of severity over the marking checks were generally not statistically significant, but there were significant effects in which the experienced and new markers became more lenient over the checks in the shorter writing task. These were small average effects, being an increase in one quarter of a mark for experienced markers and half a mark for new markers by the fifth marking check. These groups were, on average, severe on the first marking check, but their general move towards more lenient marking over the series of checks brought them closer to the true marks.To what extent can these findings be generalised to the quality of marking of the summer 2008 Key Stage 3 English Writing examination? Generalisability is undermined by the use of small samples of work in the marking checks, in comparison with the population of pupils; as these small samples were unlikely to be representative. Additionally, markers were aware that their performance was being monitored when these data were collected and that the consequence of poor performance was termination of employment, which was likely to have been a motivating factor. On the positive side, these data were collected during operational timeframes from real markers who had undergone standard training procedures. True marks were generated by a pool of expert examiners, so there should be a high degree of confidence in the yardstick. The quality assurance process functioned to weed out poor markers, not to monitor live marking quality. To the extent that the quality assurance data are representative of live marking, these findings can be generalised to the live marking. Further work would be needed to establish this. Nonetheless, it would be unwarranted to completely disregard data from quality assurance checks as irrelevant to concerns about marking accuracy.
|Translated title of the contribution||Changes in the severity of marking as a function of number of scripts marked: Key stage 3 English writing summer 2008 quality assurance data|
|Publisher||National Assessment Agency|
|Number of pages||30|
|Publication status||Published - 28 Feb 2009|