UD for Kurmanji

Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We tokenize them as separate tokens (words), except the following cases:
- The period marking an abbreviation: Dr. “doctor” is one token.
- The apostrophe (or occasionally a hyphen) is not treated as punctuation when it occurs between a number and its morphological suffix, as in 15’ê, 1932’an.
- There is a type of verb called ‘Lêkerên hevedudanî’ which is similar to English phrasal verbs. These verbs typically consist of two or three parts that are separated by spaces when written. However, in passive voice and causative forms, these parts are written adjacent.
- There are several closed classes of contractions that are treated as multi-word tokens and segmented to individual syntactic words. The most prominent type is a pronoun fused with the future auxiliary: ezê = ez + dê “I will”.
- Kurmanji uses all 17 universal POS categories, including particles (PART). Only 2 word types are tagged PART: jî “also”, ma.
- Kurmanji has four auxiliaries; three of them inflect like verbs (and can act as full verbs depending on context), while dê is an uninflected particle:
- The copula bûn “to be”.
- The future tense marker dê.
- The passive auxiliary hatin “to come” (it combines with an infinitive of the lexical verb).
- The causative auxiliary dan “to give” (it combines with an infinitive of the lexical verb).
- Verbs with modal meaning are not considered auxiliary in Kurmanji.
- There are four main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Nominal Features
- Nominal words (NOUN, PROPN) have an inherent Gender feature with one of two values:
. The gender of the referent is reflected by PRON and DET. - The two values of the Number feature are
. The following parts of speech inflect for number: NOUN, PROPN, PRON, DET, VERB, AUX, marginally NUM. - Case has 4 possible values:
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM.
Degree and Polarity
- Degree applies to adjectives (ADJ) and has one of three possible values:
. For example, zêde “a lot of”, zêdetir “more”, zêdetirîn “most”. - Polarity has one value,
is not marked explicitly), and applies primarily to verbs (VERB, AUX), determiners (DET) and adverbs (ADV).
Verbal Features
- Aspect is
(perfective) andProg
(progressive); it can be also unmarked. - Finite verbs always have one of four values of Mood:
. - Verbs in the indicative mood always have one of four values of Tense:
. - Evident (evidentiality) has only one value,
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
- NumType is used with numerals (NUM – only
). - The Reflex feature marks reflexive pronouns (xwe).
- Person is a lexical feature of personal pronouns (PRON) and has three values,
. Person is not marked on other types of pronouns and on nouns, although they can be almost always interpreted as the 3rd person.
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- Objects may be bare noun phrases in accusative (oblique).
Non-verbal Clauses
- The copula verb bûn (be) is used in equational, attributional, locative, possessive and benefactory nonverbal clauses.
There is 1 Kurmanji UD treebank: