UD Azerbaijani TueCL
Language: Azerbaijani (code: az
)
Family: Turkic
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Soudabeh Eslami, Çağrı Çöltekin.
Repository: UD_Azerbaijani-TueCL
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Azerbaijani-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [soudabeh • eslami (æt) student • uni-tuebingen • de, cagri • coeltekin (æt) uni-tuebingen • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | not available |
Relations | annotated manually, natively in UD style |
Description
This is a small treebank of grammatical examples for Azerbaijani. The
treebank tries to be neutral about the particular variety (North or
South Azerbaijani, hence uses the ISO code for the macrolanguage
(az
).
Azerbaijani-TueCL contains totaly ~110 sentences including 20 Cairo sentences, and ~90 sentences suggested by UD Turkic Group. This treebank is a part of UD Turkic Treebank. Translation of all sentences are available in English, Turkish and Kyrgyz languages.
Azerbaijani is written currently in three different alphabets: the Persian alphabet in the South, the Cyrillic and Latin alphabets in the North. This treebank contains only sentences with Latin script now, and more sentences in (Arabic-based) Persian alphabet will be added.
Acknowledgments
We are deeply thankful to the UD Turkic Group for their weekly informative meetings and discussions and for all the support we have received.
References
- (citation)
Statistics of UD Azerbaijani TueCL
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Relations
acl – advcl – advmod – advmod:emph – amod – appos – aux – case – cc – ccomp – compound – compound:lvc – compound:redup – conj – cop – det – discourse – fixed – flat – mark – nmod – nmod:poss – nsubj – nsubj:outer – nummod – obj – obl – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 109 sentences, 655 tokens and 663 syntactic words.
- This corpus contains 137 tokens (21%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 3 types of words that contain both letters and punctuation. Examples: yoxdu(r), düktürdü(r), saatdı(r)
- This corpus contains 8 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 7 types of multi-word tokens. Examples: evdəki, Denizinkinin, aşpazxananınkını, bağdakı, kitabxanadakılar, pərəstarıdı, sәninkindәn.
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PART, SYM, X
- This corpus contains 11 lemmas tagged as pronouns (PRON): -ki, Nәmә, belə, biz, bu, kim, mən, nə, o, siz, sən
- This corpus contains 6 lemmas tagged as determiners (DET): bir, bu, harda, heç, o, tamam
- Out of the above, 2 lemmas occurred sometimes as PRON and sometimes as DET: bu, o
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): bil, dəyil, i, ol
- Out of the above, 2 lemmas occurred sometimes as AUX and sometimes as VERB: bil, ol
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: i.
- This corpus uses 3 lemmas as auxiliaries (aux). Examples: dəyil, bil, ol.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (25)
- VERB--PRON (9)
- obj
- VERB--NOUN (32)
- VERB--PRON (3)
Relations Overview
- This corpus uses 5 relation subtypes: advmod:emph, compound:lvc, compound:redup, nmod:poss, nsubj:outer
- The following 9 relation types are not used in this corpus at all: iobj, csubj, expl, dislocated, clf, list, goeswith, reparandum, dep