home edit page issue tracker

This page pertains to UD version 2.

UD for Dutch

Tokenization and Word Segmentation


Instruction: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.


Morphology

Tags

Detailed documentation of the decisions w.r.t. POS-tags in the original data can be found in the D-COI POS-tagging and lemmatization manual

Features

Detailed documentation of the decisions w.r.t. features in the original data can be found in the D-COI POS-tagging and lemmatization manual

Syntax

The Dutch treebanks are automatically converted from annotated and manually corrected treebanks. Detailed documentation of the the original syntactic annotation is in the syntactic annotation manual of the Lassy project. The data included in the UD treebanks can be explored using the PaQu interface, which supports querying both the original and UD annotation.


Treebanks

There are 2 Dutch UD treebanks:


Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and from the data in the latest release. Link to the respective *-index.html page in the treebanks folder, using the language code and the treebank code in the file name.