Manually annotated corpora of writing products may greatly contribute to writing research: they offer detailed insights in the quality of these texts, in the text features actually attended to by human text raters, in possibilities and difficulties for the use of automatic writin
...
Manually annotated corpora of writing products may greatly contribute to writing research: they offer detailed insights in the quality of these texts, in the text features actually attended to by human text raters, in possibilities and difficulties for the use of automatic writing analytics and writing tools, and in the relations between different text quality dimensions. This paper presents the Utrecht System for Annotation of Learner text (USALT), that covers both general features (orthography, punctuation, wording, coherence) and genre-specific elements (such as openings, endings, structuring devices and politeness). The annotations contain up to three items (annotation unit; problem type; part-of-speech tag). USALT reflects various text quality dimensions, notably correctness, comprehensibility and appropriateness (both stylistically and in terms of genre conventions).
We present an USALT analysis of 371 texts produced by Dutch students from grades 7-9 (aged 12-15 years), taken from the so-called Schrijfmeters-corpus. The assignment concerned a letter about ‘typically Dutch things’ to a Swedish girl about to emigrate to The Netherlands. USALT reliabilities were adequate. In terms of problem frequency, we were struck by the pervasiveness of punctuation problems. Furthermore, the orthography and punctuation problems together present considerable difficulties for automatic analysis of original learner texts at this level. A remarkable result regarding relations between various text quality dimensions is that the frequency of orthography problems correlates higher with genre convention problems than with lexico-grammatical problems. We also used the annotations as predictors of the holistic scores assigned to the texts by human raters. Standardized annotation frequencies by themselves may account for 45% of the score variance, with a prominent role for annotations regarding genre elements; text length by itself explains 52%. The best model includes both text length and annotations (65% explained variance). In ongoing work, USALT is being extended to handle argumentative writing assignments.@en