Linguistik online




Korpora bedrohter Sprachen als eierlegende Wollmilchsau? Das Beispiel GENIE

Roland Marti/Bistra Andreeva/William Barry (ORT)



The question of documentation is central to any work with corpora of threatened languages and has important consequences for the corpus structure. The 'Korpus für GEsprochenes NIEdersorbisch (engl.: Corpus for spoken Lower Sorbian)' (GENIE, is a case in point. GENIE focuses on spoken language. The corpus, made up of more than 60 hours of recordings, comprises a) recordings made by Sorbian Radio (1956–2006) with different forms of Lower Sorbian, b) dialect recordings from the Serbski kulturny archiw (engl.: Sorbian Culture Archive) (SKA) in Budysin/Bautzen (1951–1971), and c) new recordings, also of the dialect forms of Lower Sorbian (2005–2006).

These data support the analysis of

– the relationship of dialects to the standard form of Lower Sorbian (phonetic/phonological,  diachronic/synchronic),
– the relationship of spoken to written Sorbian,
– the influence of German,
– the development of idiolectal forms.

The corpus is also of direct practical use in

– theWITAJ project for the revitalization of Lower Sorbian,
– decisions on codification (spelling, grammar).

Problems for analysis stem from the eclectic nature of the recordings, the considerable variation in recording quality and in the lack of annotation. The talk discusses the perspectives for and the problems facing spoken corpora of threatened languages.

full text