Welcome to the public corpus documentation site of the Society for Danish Language and Literature, DSL.

The site is maintained by Jørg Asmussen, DSL.

This site comprises all kinds of documentation and other writings related to the gathering of comprehensive LGP corpora of modern Danish carried out at DSL. These corpora are mainly used by the ordnet.dk project, cf. ordnet.dk publications.

This site also comprises a number of NLP resources. Some of these resources are exclusively available at ja-korpus.dsl.lan which means that they are accessible only from within the dsl domain (DSL only). Other resources come as password protected zip files (password). Some are freely available (free).

General Corpus Documentation

This documentation paves the ground for the corpus work at DSL.

Corpus Retrieval

Software Tools

Text Classification

Resources for download

Some of the resources listed below are available for public download. However, most of them require a password to unzip. To obtain a password, please send a mail to korpus@dsl.dk with a brief description of the purpose(s) you intend to use the resources for.

If you download resources from this site you agree to to the following conditions:

  1. The language material may only be used for the indicated purpose(s) and must not be copied or transferred to a third party. It must not without special prior arrangement be used commercially or form part of a commercial product.
  2. The Society for Danish Language and Literature must be credited in publications or products including products in digital form such as software programmes or Internet applications based entirely or partly on DSL language material. Additionally, DSL is to have a copy of such publications or products. If the publications or products are put on the Internet they must provide a link to www.dsl.dk.


Due to copyright reasons, the corpora listed below comprise sentences or shorter excerpts in arbitrary order. They do not contain full texts.

Word lists