UD Chukchi HSE
Language: Chukchi (code: ckt
)
Family: Chukotko-Kamchatkan
This treebank has been part of Universal Dependencies since the UD v2.7 release.
The following people have contributed to making this treebank part of UD: Francis Tyers, Karina Mischenkova.
Repository: UD_Chukchi-HSE
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Chukchi-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ftyers (æt) iu • edu]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | not available |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | not available |
Features | not available |
Relations | annotated manually, natively in UD style |
Description
This data is a manual annotation of the corpus from multimedia annotated corpus of the Chuklang project, a dialectal corpus of the Amguema variant of Chukchi.
The corpus contains spoken Chukchi in the Amguema variant. Chukchi is a polysynthetic language spoken in the Chukotka Autonomous Okrug in the north-east of Siberia.
Acknowledgments
This work is entirely based on the glossed corpus developed by the Chuklang project. They have their own acknowledgements here.
References
If you use this in your work, please cite:
- Tyers, F. M. and Mishchenkova, K. (2020) “Dependency annotation of noun incorporation in polysynthetic languages”. Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020). pp. 195—204
@inproceedings{tyers:20,
author = {Francis M. Tyers and Karina Mishchenkova},
title = {Dependency annotation of noun incorporation in polysynthetic languages},
booktitle = {Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)},
pages = {195--204},
year = 2020
}
Statistics of UD Chukchi HSE
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Relations
acl – acl:attr – acl:relat – advcl – advmod – advmod:emph – amod – appos – aux – aux:neg – case – cc – ccomp – conj – cop – dep – det – discourse – dislocated – flat – flat:foreign – flat:name – mark – nmod – nmod:attr – nmod:poss – nmod:relat – nsubj – nummod – obj – obl – orphan – parataxis – parataxis:rep – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 1004 sentences, 5389 tokens and 6124 syntactic words.
- This corpus contains 1004 tokens (19%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: Санкт-Петербург, Санкт-Петербургэты, по-русски
- This corpus contains 653 multi-word tokens. On average, one multi-word token consists of 2.13 syntactic words.
- There are 493 types of multi-word tokens. Examples: ынӄэнэ, ӄэԓюӄъым, ӄоԓьым, ынкъамэ, гымнинэ, ԓюутэ, читъым, ынӄэна, ынӄэнъымэ, этъым, Ынӄорыӈа, иквъиӈа, иквъэтэ, нивӄинэ, нэмыӄэе, ынӄоръым, ынӄэнъым, эвына, энмэна, янотъым, Ӄорыӈэ, ӄэԓёӄъым, Апэтыпԓыткокэ, Гымнанъым, Игытъым, Къама, Нанъяачьым, Наӄамэ, Опопыӈа, Ынӄорыӈ, аʼачекъым, вае, гымыкытԓьэн, итыкэ, микынтим, миӈкыриыʼм, мурыгрээн, мытыпкирмыкъым, мытԓемыкъым, ниԓьуткуԓьэтӄинъым, нэмыӄэйъым, нэмэӈэ, пыкиргъиӈэ, тынотапынмынтагъакъым, ыныгрээн, ынӄорыӈэ, ынӄэнатаӈэ, ынӈинэ, ытръэчьым, ытԓыгынэ.
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 52 word types tagged as particles (PART): Аны, Аԓым, Кытвыԓӄун, Мытив, Нука, Нӄэн, Пԓевыт, Тэӈуйӈэ, Ынӄэӄ, Ынӈин, Ытԓён, а, амвынэ, аԓымы, ва, вай, ванэ, вынэ, вэнԓыги, вэчьым, е, иʼм, итык, китаӄун, м, мачынан, уйӈэ, ъы, ъым, ъыма, ъэм, ы, ыʼм, ынръам, ынӄэ, ынӄэн, ьым, э, эʼм, эвын, энмэн, это, ӄаӈан, ӄун, ӄытԓыги, Ӈуттэԓв, ӈ, ӈа, ӈан, ӈэ, ӈэвэӄ, ԓыгэн
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (146)
- VERB--PRON (109)
- obj
- VERB--NOUN (138)
- VERB--PRON (35)
Relations Overview
- This corpus uses 10 relation subtypes: acl:attr, acl:relat, advmod:emph, aux:neg, flat:foreign, flat:name, nmod:attr, nmod:poss, nmod:relat, parataxis:rep
- The following 8 relation types are not used in this corpus at all: iobj, csubj, expl, clf, fixed, compound, list, goeswith