Linguistik online 28, 3/06

Das Internet als linguistisches Korpus

Hans Bickel (Basel)



This article discusses whether the Internet can be used as a linguistic corpus. It is based on experiences in connection with the Variantenwörterbuch des Deutschen (Dictionary of Standard German Variants), which was compiled 1997-2004. In order to identify national and regional variants of the German language in Germany, Austria and Switzerland, it was necessary to work with a large linguistic corpus that could also provide data on the frequency of rather rare words. The question was: Is the Internet suitable as a corpus for linguistic frequency analysis? The use of the WWW as corpus can be suitable only

1. if reliable and reproducible results can be obtained;
2. if the results are closely related to the language as it is actually used.

The test showed that the Internet is an extremely useful corpus to get information on word frequency. The enormous size and the large number of different text types makes it an extremely versatile corpus, which has a systematic connection to the written language reality.


