Learner Corpora without Error Tagging
The article explores the possibility of adopting a form-to-function perspective when annotating
learner corpora in order to get deeper insights about systematic features of interlanguage. A split
between forms and functions (or categories) is desirable in order to avoid the "comparative fallacy"
and because – especially in basic varieties – forms may precede functions (e.g., what
resembles to a "noun" might have a different function or a function may show up in unexpected forms).
In the computer-aided error analysis tradition, all items produced by learners are traced to a grid of
error tags which is based on the categories of the target language. Differently, we believe it is
possible to record and make retrievable both words and sequence of characters independently from their
functional-grammatical label in the target language. For this purpose at the University of Pavia we
adapted a probabilistic POS tagger designed for L1 on L2 data. Despite the criticism that this operation
can raise, we found that it is better to work with "virtual categories" rather than with errors. The
article outlines the theoretical background of the project and shows some examples in which some
potential of SLA-oriented (non error-based) tagging will be possibly made clearer.