This study investigated the application of WriteToLearn on Chinese undergraduate English majors’ essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was marked by four human raters as well as WriteToLearn. Many-facet Rasch measurement (MFRM) was conducted to calibrate WriteToLearn’s rating performance in scoring the whole set of essays against those of four trained human raters. The accuracy of WriteToLearn’s feedback on 60 randomly selected essays was compared with the feedback provided by human raters. The two main findings related to scoring were that WriteToLearn was more consistent but highly stringent relative to the four trained human raters in scoring essays and that it failed to score 7 essays. In terms of error feedback, WriteToLearn had an overall precision and recall of 49% and 18.7% respectively. These figures did not meet the minimum threshold of 90% precision for it to be a reliable error detecting tool set by Burstein, Chodorow, and Leacock (2003). Furthermore, it had difficulty in identifying the errors made by Chinese undergraduate English majors in the use of articles, prepositions, word choice and expression.
- Accuracy of error feedback
- automated writing evaluation
- Chinese undergraduate English majors
- scoring ability