home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Chukchi HSE

Language: Chukchi (code: ckt)
Family: Chukotko-Kamchatkan

This treebank has been part of Universal Dependencies since the UD v2.7 release.

The following people have contributed to making this treebank part of UD: Francis Tyers, Karina Mischenkova.

Repository: UD_Chukchi-HSE
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Chukchi-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ftyers (æt) iu • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	not available
UPOS	annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS	not available
Features	not available
Relations	annotated manually, natively in UD style

Description

This data is a manual annotation of the corpus from multimedia annotated corpus of the Chuklang project, a dialectal corpus of the Amguema variant of Chukchi.

The corpus contains spoken Chukchi in the Amguema variant. Chukchi is a polysynthetic language spoken in the Chukotka Autonomous Okrug in the north-east of Siberia.

Acknowledgments

This work is entirely based on the glossed corpus developed by the Chuklang project. They have their own acknowledgements here.

References

If you use this in your work, please cite:

Tyers, F. M. and Mishchenkova, K. (2020) “Dependency annotation of noun incorporation in polysynthetic languages”. Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020). pp. 195—204

@inproceedings{tyers:20,
author = {Francis M. Tyers and Karina Mishchenkova},
title = {Dependency annotation of noun incorporation in polysynthetic languages},
booktitle = {Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)},
pages = {195--204},
year = 2020
}

Statistics of UD Chukchi HSE

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

Relations

acl – acl:attr – acl:relat – advcl – advmod – advmod:emph – amod – appos – aux – aux:neg – case – cc – ccomp – conj – cop – dep – det – discourse – dislocated – flat – flat:foreign – flat:name – mark – nmod – nmod:attr – nmod:poss – nmod:relat – nsubj – nummod – obj – obl – orphan – parataxis – parataxis:rep – punct – reparandum – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 1004 sentences, 5389 tokens and 6124 syntactic words.

This corpus contains 1004 tokens (19%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 3 types of words that contain both letters and punctuation. Examples: Санкт-Петербург, Санкт-Петербургэты, по-русски

This corpus contains 653 multi-word tokens. On average, one multi-word token consists of 2.13 syntactic words.
There are 493 types of multi-word tokens. Examples: ынӄэнэ, ӄэԓюӄъым, ӄоԓьым, ынкъамэ, гымнинэ, ԓюутэ, читъым, ынӄэна, ынӄэнъымэ, этъым, Ынӄорыӈа, иквъиӈа, иквъэтэ, нивӄинэ, нэмыӄэе, ынӄоръым, ынӄэнъым, эвына, энмэна, янотъым, Ӄорыӈэ, ӄэԓёӄъым, Апэтыпԓыткокэ, Гымнанъым, Игытъым, Къама, Нанъяачьым, Наӄамэ, Опопыӈа, Ынӄорыӈ, аʼачекъым, вае, гымыкытԓьэн, итыкэ, микынтим, миӈкыриыʼм, мурыгрээн, мытыпкирмыкъым, мытԓемыкъым, ниԓьуткуԓьэтӄинъым, нэмыӄэйъым, нэмэӈэ, пыкиргъиӈэ, тынотапынмынтагъакъым, ыныгрээн, ынӄорыӈэ, ынӄэнатаӈэ, ынӈинэ, ытръэчьым, ытԓыгынэ.

Morphology

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

This corpus uses 1 lemmas as copulas (cop). Examples: _.

This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (146)
- VERB--PRON (109)

obj
- VERB--NOUN (138)
- VERB--PRON (35)

iobj

Relations Overview

This corpus uses 10 relation subtypes: acl:attr, acl:relat, advmod:emph, aux:neg, flat:foreign, flat:name, nmod:attr, nmod:poss, nmod:relat, parataxis:rep
The following 8 relation types are not used in this corpus at all: iobj, csubj, expl, clf, fixed, compound, list, goeswith