
- Documentation of the use of the Danish language
The aim of the Korpus 2000 project is to document the use of the Danish language around the year 2000 - in the form of a text corpus in which one can look up words and phrases via this website. The texts that constitute the Korpus 2000 were written mainly between 1998 and 2002.
To ensure as much variety as possible in the composition of the Korpus 2000, which contains texts from many different areas, a comprehensive collection of Danish texts - called a text bank - was established. Various annotations were then added to each of the texts and text excerpts that make up the text bank.
These annotations include information such as who the author of the text is, when the text was written, what kind of text it is, and in which medium the text was originally published.
As the collection of texts was completed in spring 2002, text excerpts were systematically selected according to specified criteria and compiled into a large document, which constitutes the actual corpus. The Korpus 2000 comprises about 28 million words of running text. Syntactic (i.e. parts-of-speech) and morphological (i.e. inflections) information have been added to all of the words of the corpus, thereby facilitating the use of this information in corpus research.
The Korpus 2000 was then entered into the corpus query system CQP, developed by the Institut für Maschinelle Sprachverarbeitung at the University of Stuttgart and made accessible via this website in August 2002.
In this way, anyone interested in the use of the Danish language around the year 2000 will be able to conduct his or her own linguistic research. The Korpus 2000 will hence be a topical supplement to traditional Danish dictionaries.
The Korpus 90 is compiled of text excerpts written in the period 1988-1992. This corpus is quite similar to the Korpus 2000 in its composition and size and hence serves as an older comparative corpus for the Korpus 2000.