|Home||Linguistik online 39, 3/2009|
Korpora bedrohter Sprachen als eierlegende Wollmilchsau? Das Beispiel GENIE
Roland Marti/Bistra Andreeva/William Barry (ORT)
The question of documentation is central to any work with corpora of threatened languages and has important consequences for the corpus structure. The 'Korpus für GEsprochenes NIEdersorbisch (engl.: Corpus for spoken Lower Sorbian)' (GENIE, www.coli.uni-saarland.de/genie/) is a case in point. GENIE focuses on spoken language. The corpus, made up of more than 60 hours of recordings, comprises a) recordings made by Sorbian Radio (1956–2006) with different forms of Lower Sorbian, b) dialect recordings from the Serbski kulturny archiw (engl.: Sorbian Culture Archive) (SKA) in Budysin/Bautzen (1951–1971), and c) new recordings, also of the dialect forms of Lower Sorbian (2005–2006).
These data support the analysis of
– the relationship of dialects to the standard form of
Lower Sorbian (phonetic/phonological, diachronic/synchronic),
The corpus is also of direct practical use in
– theWITAJ project for the revitalization of Lower
Problems for analysis stem from the eclectic nature of the recordings, the considerable variation in recording quality and in the lack of annotation. The talk discusses the perspectives for and the problems facing spoken corpora of threatened languages.