Corpus Resources & Documentation
Corpus Resources for Download
This page lists corpus resources that you may download. Most resources come as password protected zip files (tagged password). To obtain a password, please send a mail to firstname.lastname@example.org with a brief description of the purpose(s) you intend to use the resources for.
Some resources are only accessible from within the dsl domain (tagged DSL only).
OBS! By downloading resources from this site, you agree to the conditions for using them.
Due to copyright reasons, the corpora listed below comprise sentences or shorter excerpts in arbitrary order. They do not contain full texts.
- Korpus 90 – 32 million tokens of written Danish LGP gathered around 1990, ePOS-tagged and lemmatized (password)
- Korpus 2000 – 30 million tokens of written Danish LGP gathered around 2000, ePOS-tagged and lemmatized (password)
- Korpus 2010 – 45 million tokens of written Danish LGP gathered around 2010 as part of the DK-CLARIN Project, ePOS-tagged and lemmatized (password)
- ePAROLE – beta version of the Danish PAROLE corpus tagged with the ePOS tag set. No documentation yet, refer to Design of the ePOS tagger instead. (password)
- 10000 most frequently used lemmas in Danish — More…
- Full-form lexicon: lemmas with inflected forms — More…
Corpus Resources & Documentation • Jørg Asmussen @ DSL