home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD for Old Turkish

UD Old Turkish is an effort to digitize and annotate (or annotate from existing digitization) existing, or structurally constructed to be coherent and fit, Old Turkic script texts. Having all corpus in Old Turkic script is a precondition for this language. This document intends to be rough than precise because the approach of annotation can change drastically over time.

Tokenization and Word Segmentation

The only guarantee is that colon punctuation (which roughly functions like whitespace) delimits letters before it, but that does not guarantee that letters ranged by two colons constitute a one-word unit.
For the subtleties, with respect to word segmentation, everything is an exact match to the reference work “Ahmet Bican Ercilasun, Türk Kağanlığı ve Türk Bengü Taşları, Dergâh Yayınları.”
Treebanks should treat whitespace as an individual character and not implicitly have it by resorting to SpaceAfter=Yes (or by not specifying SpaceAfter=No).

Morphology

Features

TODO

Syntax

TODO

Treebanks

There is one Old Turkish UD treebank: