home edit page issue tracker

This page pertains to UD version 2.

UD Kazakh KTB

Language: Kazakh (code: kk)
Family: Turkic, Northwestern

This treebank has been part of Universal Dependencies since the UD v1.3 release.

The following people have contributed to making this treebank part of UD: Aibek Makazhanov, Jonathan North Washington, Francis Tyers.

Repository: UD_Kazakh-KTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-SA 4.0

Genre: wiki, fiction, news

Questions, comments? General annotation questions (either Kazakh-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [aibek • makazhanov (æt) nu • edu • kz, jonathan • north • washington (æt) gmail • com, ftyers (æt) prompsit • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually, natively in UD style

Description

The UD Kazakh treebank is a combination of text from various sources including Wikipedia, some folk tales, sentences from the UDHR, news and phrasebook sentences. Sentences IDs include partial document identifiers.

The tokenisation in the Kazakh UD treebank follows the principles of Turkic lexica in Apertium. Morphological processing in the Kazakh UD treebank follows the principles of Turkic lexica in Apertium. The file designated as “train” is just a small sample to show how the data looks like to shared task participants. The treebank is too small to provide for a standard training-development-test split. Instead, users are advised to merge both files, then jack-knife and report results of ten-fold cross-validation.

Acknowledgments

Please, cite the following papers if you use Kazakh UD treebank:

@inproceedings{tyers_tl2015,
author = {Tyers, Francis M. and Washington, Jonathan N.},
title = {Towards a Free/Open-source Universal-dependency Treebank for Kazakh},
booktitle = {3rd International Conference on Turkic Languages Processing,
(TurkLang 2015)},
pages = {276--289},
year = {2015},
}

@inproceedings{makazhan_tl2015,
author = {Makazhanov, Aibek and
Sultangazina, Aitolkyn and
Makhambetov, Olzhas and
Yessenbayev, Zhandos},
title = {Syntactic Annotation of Kazakh: Following the Universal Dependencies Guidelines. A report},
booktitle = {3rd International Conference on Turkic Languages Processing,
(TurkLang 2015)},
pages = {338--350},
year = {2015},
}

Statistics of UD Kazakh KTB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AspectCaseDegreeEvidentGenderMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPolitePronTypeReflexTenseVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxcaseccccompclfcompoundcompound:lvcconjcopcsubjdepdetdiscoursefixedflat:nameiobjmarknmodnmod:possnsubjnummodobjoblobl:ownorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Verbs with Reflexive Core Objects

Relations Overview