home edit page issue tracker

This page pertains to UD version 2.

UD for Turkish

This is a work-in-progress overview of the UD annotation for Turkish.

Unfortunately, different treebanks follow (slightly) different annotation guidelines, and, as of v2.4, multiple uncoordinated attempts of correction efforts were known. Currently, as of v2.14, there’s a group working on the unification of the Turkish treebanks, named the UD Turkic Group.

Tokenization and Word Segmentation

For more details, see tokenization.

Morphology

Turkish has a rich inflectional and derivational morphology. Some of the morphological phenomena are not satisfactorily annotated as of UD v2. This includes some missing feature-value pairs, e.g., ‘reflexive voice’ which is marked using language specific value Voice=Rfl. Another open issue is multiple values for certain UD morphological features. For example, a gelemeselerdi “if they were not able to come_ expresses two different modalities, requiring assigning both Pot and Cnd to the Mood feature. Currently these multiple features are expressed by concatenating the values together in alphabetic order, resulting in feature-value pairs like Mood=CndPot. Besides Mood, Voice may also have multiple values.

Tags

This is an overview only. For more detailed discussion and examples, see the list of Turkish POS tags and Turkish features.

Syntax

This is an overview only. For more detailed discussion and examples, see the list of relations,

Relations Overview

Treebanks

As of UD 2.13, there are nine Turkish UD treebanks, with more treebanks in progress.