UD for Ukrainian
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words); that holds even for hyphenated compounds such as українсько-чеський “Ukrainian-Czech” (three tokens) and for abbreviations such as [і] т.i. “and so on” (four tokens).
- A whitespace separating digits in a large number is not treated as a word separator. For example, 1 000 000 (“1,000,000” by English rules) is one token.
- There are two closed classes of contractions that are treated as multi-word tokens and segmented to individual syntactic words:
- Pronouns like ні́де (немає де) “there is nowhere to”, ні́кому (немає кому) “there is no one to” (not to confuse with ніде́ “nowhere”, ніко́му “to no one”; notice the accent).
- Nouns fused with a numeral пів “half of”: півметра “half a meter”. Such spelling is now deprecated in standard Ukrainian and it is now correct to use a whitespace: пів метра.
Morphology
Tags
This is an overview only. For more detailed discussion and examples, see the list of Ukrainian POS tags and Ukrainian features.
- Ukrainian uses all 17 universal POS categories, including particles (PART). At present, more than 100 word types are tagged PART.
- The pronoun (PRON) vs. determiner (DET) distinction is based on word lists because the traditional grammar does not define determiners. In general, words that inflect for gender, to be able to agree with a modified noun, are tagged DET, even if they act independently in a given sentence; that includes possessives. Pronominal quantifiers (which the traditional grammar includes in numerals) are DET as well.
- Ukrainian has just one auxiliary verb (AUX), бути (“to be”) with its derivative бувати.
- In other words, бути and бувати are the only lemmas that occur with the AUX tag.
They may still occur also as normal VERB if they are used in purely existential sentences
(i.e. such that don’t even indicate location because if they do, then бути is treated as copula).
- Note that this may be changed in future. Existential sentences could be treated as elliptical versions of locational sentences; then the verb would be the root, but it could still be tagged as
AUX
and theAUX
-VERB
distinction could be anchored in the lexicon.
- Note that this may be changed in future. Existential sentences could be treated as elliptical versions of locational sentences; then the verb would be the root, but it could still be tagged as
- Verbs with modal meaning are not considered auxiliary in Ukrainian.
- There are five main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Nominal Features
- Nominal words (NOUN, PROPN and PRON) have an inherent Gender feature with one of three values:
Masc
,Fem
orNeut
. In some cases the masculine gender is further subclassified by the Animacy valuesAnim
andInan
. Feminine and neuter nominals do not distinguish animacy grammatically. - The two main values of the Number feature are
Sing
andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, ADJ, DET, VERB, AUX (finite, participles and converbs), marginally NUM.- Remnants of the
Dual
number occur only in the instrumental Case of a few nouns and all the agreeing parts of speech. - Selected nouns are plurale tantum (
Ptan
) or singulare tantum (Coll
). These two values are lexical and cannot be used with the agreeing adjectives, determiners or verbs. They also never occur with pronouns.
- Remnants of the
- Case has 7 possible values:
Nom
,Gen
,Dat
,Acc
,Voc
,Loc
,Ins
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, ADJ, DET, NUM. It can occur with participles but only with those tagged asADJ
. It never occurs with verbs.- The
Case
feature also occurs with prepositions (ADP). Here it is a lexical feature. Prepositions do not inflect for case but they subcategorize for the case of their noun phrase.
- The
Verbal Features
- Verbs have a lexical Aspect, either imperfective (
Imp
) or perfective (Perf
).- The
Aspect
feature should be also used with the corresponding derived nouns and adjectives (participles), if they have theVerbForm
feature.
- The
- Finite verbs always have one of three values of Mood:
Ind
,Imp
. Ukrainian conditional is formed periphrastically using the past participle of the content verb and a special form of the auxiliary verb б. - Verbs in the indicative mood always have one of three values of Tense:
Past
,Pres
orFut
.- Imperative forms do not have the
Tense
feature.
- Imperative forms do not have the
- There are two values of the Voice feature:
Act
andPass
. Only the passive participle hasVoice=Pass
. All other verb forms haveVoice=Act
.
Syntax
This is an overview only. For more detailed discussion and examples, see the list of Ukrainian relations.
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- If the noun phrase is quantified, it may be in the genitive, which is required by the quantifier. If this is the case, then the quantifier is attached using a special relation, either nummod:gov or det:numgov.
- An infinitive verb may serve as the subject and is labeled as clausal subject, csubj.
On the other hand, verbal nouns as subjects are just
nsubj
. - A finite subordinate clause may serve as the subject and is labeled
csubj
.
- Objects defined in the Ukrainian grammar may be bare noun phrases in accusative, dative, genitive or instrumental, or prepositional phrases in accusative, dative, genitive, locative or instrumental.
- Bare accusative, dative, genitive and instrumental objects are considered core.
- All prepositional objects are considered oblique.
- Accusative objects of some verbs alternate with finite clausal complements, which are labeled ccomp.
- If a verb subcategorizes for the infinitive (e.g. modal verbs or verbs of control), the infinitival complement is labeled xcomp.
- If a verb subcategorizes for two core objects, one of them accusative (or
ccomp
) and the other non-accusative, then the non-accusative object is labeled iobj. Core nominal objects in other situations are labeled just obj.
- Adjuncts (a.k.a adverbial modifiers realized as noun phrases) are usually prepositional phrases, but they can be bare noun phrases as well. They are labeled obl:
- Temporal modifiers realized as accusative noun phrases: повернусь днями “I will come back next days.”
- Instrumental noun phrases expressing the way or means with which something was done. Example: побив пса палкою “he beat up the dog with a stick.”
- All prepositional phrases that are not prepositional objects (i.e., their role and form is not defined lexically by the predicate) are adjuncts.
Non-verbal Clauses
- The copula verb бути (be) is used in equational, attributional, locative, possessive and benefactory nonverbal clauses. Purely existential clauses (without indicating location) use бути as well but it is treated as the head of the clause and tagged VERB.
Relations Overview
- The following relation subtypes are used in Ukrainian:
- acl:adv adverbs acting as amod
- acl:relcl relative clauses
- advcl:pred
advcl
with secondary predication - advcl:svc adverbial infinitive
- advmod:det adverbial modification by a determiner
- compound:svc
- conj:svc coordination of serial verbs
- det:numgov pronominal quantifiers that are attached as children of the quantified noun but govern its case
- det:nummod pronominal quantifiers in cases in which they do not govern the case of the quantified noun
- flat:abs clausal absolutive
- flat:foreign non-first words in foreign phrases
- flat:name human names
- flat:range numerical and temporal ranges
- flat:repeat repetitions
- flat:sibl
- flat:title
- nummod:gov cardinal numbers that are attached as children of the counted noun but govern its case
- parataxis:discourse clausal
discourse
- parataxis:newsent connects sentences inside of a multi-sentence quote where it’s impossible to split the parent sentence
- parataxis:rel clauses relative to the whole parent sentence, that is, to the parent sentence predication itself
- vocative:cl clausal
vocative
- xcomp:pred
xcomp
with secondary predication
- The following relation types are not used in Ukrainian at all: clf
Treebanks
Currently there is a single Ukrainian UD treebank: