UD for Armenian (Eastern)
Tokenization and Word Segmentation
- Words are generally delimited by whitespace or punctuation. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words); this holds even for hyphenated compounds such as անգլո-ամերիկյան “anglo-american” (three tokens) and for abbreviations such as թ. “year” (two tokens).
- Numerical expressions (including dates, expressions with hyphen and Armenian endings) are treated as single words and may contain punctuation or whitespace: 1.1.1970, 1/1/1970, 11:00, 2 000, 10-15, 2,15, 1-ին “1st”, 1700-ամյա “1700-year-old”, ՆԱՏՕ-ական “belonging-to-NATO”.
- Words, containing “infixed” punctuation (e.g. question, exclamation, emphasis and Armenian abbreviation marks), as in ինչո՞ւ = ինչու + ՞ “why?”, are treated as multi-word tokens and segmented to individual syntactic words. For more details, see tokenization.
Morphology
Tags
This is an overview only. For more detailed discussion and examples, see the list of Armenian POS tags and Armenian features.
- Armenian uses all 17 universal POS categories, including particles (PART). The exact list of particles is in the stage of being worked out.
- The tag DET is used for articles and pronominal words used with a determiner function, including possessives (the traditional grammar does not define determiners, but distinguishes pronominal modifiers). The tag PRON is reserved for pronouns occurring as the head of a noun phrase. Pronominal quantifiers (which the traditional grammar includes in pronouns) are DET as well.
- Eastern Armenian has one auxiliary verb (AUX), եմ (“to be”), but the lemma լինել is also possible.
This is in fact just an aspectual variant of եմ, but this is a separate lemma because
the morphological process that relates it to եմ is considered derivational. There is another auxiliary տալ (“cause/make someone to perform action”) for periphrastic causatives.
Auxiliaries are all verbal in Eastern Armenian and can be grouped into three types:- The copula with non-verbal predicates.
- Periphrastic present tense (present form of եմ + resultative participle of the main verb).
- Periphrastic past tense (present form of եմ + perfect of the main verb; imperfect form of եմ + imperfective, perfect, future-I and resultative participles of the main verb).
- Periphrastic future tense (present form of եմ + future-I participle of the main verb).
- Periphrastic negated conditional (negated present or imperfect form of եմ + connegative form of the main verb).
- Periphrastic “secondary compound tenses” (any form of լինել, including periphrastic forms, + processual, resultative and future-I participles of the main verb).
- Periphrastic causative (any form of տալ, including periphrastic forms, + infinitve of the main verb).
- In other words, եմ, լինել and տալ are the only lemmas that occur with the AUX tag (Exception: finite existentials կամ and ունեմ in combination with resultative participle). Note, that եմ and լինել may also occur as normal VERB if they are used in purely existential sentences (i.e. they don’t indicate location because if they do, then եմ and լինել will be treated as copula).
- Verbs with modal meaning are not considered to be auxiliaries in Armenian.
- There are five main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
- Though the resultative, subject and future-I participles can be used adjectivally and can be negated they are generally tagged VERB. The only exception is future-II participles, they are tagged ADJ.
Nominal Features
- Nominal words (NOUN, PROPN and some of PRON) have an inherent Animacy feature with one of three values:
Hum
,Nhum
. Note that this may be changed in future. Nominal words could be treated as having inherentHum
,Nhum
and further would be subclassified by the layeredAnimacy[gram]
valuesAnim
vs.Inan
. - The two main values of the Number feature are
Sing
andPlur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, VERB, AUX (finite).- Selected nouns are plurale tantum (
Ptan
) or singulare tantum (Coll
). These two values are lexical, and cannot be used with the agreeing verbs. They also never occur with pronouns.Coll
occurs with gerundives. - There is a language-specific value
Assoc
(associative plural). This is also lexical and occurs withNOUN
andPROPN
. Some of pronouns (մերոնք, ձերոնք, իմոնք, քոնոնք) are also associative.
- Selected nouns are plurale tantum (
- Case has 6 possible values:
Nom
,Gen
,Dat
,Abl
,Ins
,Loc
. It occurs with the nominal words, i.e., NOUN, PROPN, PRON, DET, and gerundives. Note, thatGen
occurs only with pronouns and determiners.- The
Case
feature also occurs with some of adpositions, subclassified as “localizers” (ADP). It is an inflectional feature here.
- The
- The two main values of the Definite feature are
Def
andInd
. The following parts of speech inflect for definitness: NOUN, PROPN, PRON. With gerundives, resultative and subject participles the feature sometimes encodes the lexical person of the possessor, although they can be almost interpreted as the 3rd person. We mark them asDef
(see the layered features below). - Degree applies to adjectives (ADJ) and some adverbs (ADV) and has one of four possible values:
Pos
,Cmp
,Sup
,Abs
.
Verbal Features
- Verbs have a lexical Subcat, either intransitive (
Intr
) or transitive (Tran
). - Verbs have one of six values of Aspect:
Dur
,Imp
,Iter
,Perf
,Prog
orProsp
.- Note, that in Armenian iterative is considered as a lexical feature of verbs, thus they have morphologically related not iterative counterparts, but it is not a regular system and the two verbs are represented by different lemmas. We mark them as biaspectual.
- The
Aspect
feature should be also used with the corresponding deverbatives, if they have theVerbForm
feature.
- Finite verbs always have one of five values of Mood:
Cnd
,Imp
,Ind
,Nec
ofSub
.- The necessitative mood is only used with necessitative particle պիտի or the impersonal predicative պետք է. The subjuncitve finite form of the main verb, that is needed to form a periphrastic necessitative, is not marked with this feature.
- The negated conditional mood is only used with indicative auxiliaries (եմ, էի). The connegative of the main verb, that is needed to form a periphrastic negated conditional, is also marked with this feature.
- Verbs in indicative mood always have one of three values of Tense: mainly
Past
and for auxilaries and some content verbsPres
orImp
.- Imperative and necessitative forms do not have the
Tense
feature (note that imperfect and present necessitatives are distinguished analytically). - Subjunctve and conditional forms have one of the
Tense=Imp
orTense=Pres
features, which are formally imperfect or present, but semantically future. - There are five values of the Voice feature:
Act
,Cau
,Mid
,Pass
,Rcp
. Active and causative verbs haveSubcate=Tran
, the other threeSubcate=Intr
.
- Imperative and necessitative forms do not have the
Polarity
- Polarity has two values,
Pos
andNeg
, and applies primarily to verbs (VERB, AUX) that can be negated using the bound morpheme չ-.- Occasionally ոչ occurs as an independent negation particle (PART) and is marked with
Polarity=Neg
. ThePolarity=Neg
feature is also used with necessitative mood particles (չ)պիտի, (չ)պետք է . - Negating nouns are usually limited to those derived from verbs (չունեցողի, չգրվածները).
- The
Polarity
feature is not used with pronouns and determiners, although there is a subset of negative pronouns and determiners. ThePronType=Neg
feature is used there instead. - The
Polarity=Neg
feature is not used with negated conditional mood, theConnegative=Yes
feature is used there instead.
- Occasionally ոչ occurs as an independent negation particle (PART) and is marked with
Pronouns, Determiners, Quantifiers
- PronType is used with pronouns (PRON), determiners (DET) and adverbs (ADV).
- NumType is used with numerals (NUM), adjectives (ADJ) and adverbs (ADV).
- The Poss feature marks possessive personal determiners (e.g. իմ “my”), possessive interrogative, relative determiners (e.g. ում, որի “whose”), possessive nouns (e.g. հայրիկինը “father’s”) and possessive adjectives (e.g. հայոց “armenian, armenians’, refer to armenians”).
- The Reflex feature marks reflexive pronouns (ինձ, քեզ, իրեն, մեզ, ձեզ, իրենց) and determiners (իր, իրենց).
In Armenian it is always used together with
PronType=Emp
orPronType=Prs
. - Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
. With personal possessive determiners (DET), the feature actually encodes the person of the possessor. Person is not marked on other types of pronouns and on nouns, although they can almost always be interpreted as the 3rd person. - The Polite feature distinguishes informal second-person pronouns (դու, դուք,
Polite=Infm
) from the formal Դուք (Polite=Form
). The formal pronoun is phonologically equivalent in all its case forms to the second-person plural դուք but it is distinguished in orthography by the capital letter Դ. We tag it as second person (because that is its meaning) and we tag also its number (it is used for singular addressees) despite the fact that it combines with second-person plural verbs. The parser must learn thatNumber=Sing|Person=2|Polite=Form
subject attaches toNumber=Plur|Person=2
verbs, whileNumber=Sing|Person=2|Polite=Infm
subject attaches toNumber=Sing|Person=2
verbs. - There are three layered features, Person[psor], Number[psor] and Deixis[psor]. They appear with nouns, gerundives, certain pronouns and adpositions and encode the lexical person/number of the possessor or the position of an entity relative to either the speaker or the hearer. The extra layer is needed to distinguish these lexical features from the inflectional person and number that mark agreement with the modified (possessed) noun.
Other Features
- Besides the layered features listed above, there are several other language-specific features:
- The following universal features are not used in Armenian: Gender, Evident.
Syntax
This is an overview only. For more detailed discussion and examples, see the list of Armenian relations, as well as Armenian-specific examples scattered across the documentation of constructions.
Core Arguments, Oblique Arguments and Adjuncts
- Nominal subject (nsubj) is a noun phrase in the nominative case, without preposition.
- If the noun is quantified in partitive meaning, it may be in the ablative: Հնչեցին Կոմիտասի երգերից “Sounded from songs of Komitas.”
- An infinitive verb may serve as the subject and is labeled as clausal subject, csubj.
On the other hand, verbal nouns or gerundives as subjects are just
nsubj
. - A finite subordinate clause may serve as the subject and is labeled
csubj
.
- Objects defined in the Armenain grammar may be bare noun phrases in nominative or dative (as direct objects or “voice objects”) and in dative, ablative, instrumental or locative, or adpositional phrases mainly in dative (as indirect / “objects of nature” or adpositional objects).
For the purpose of UD the objects are divided to core objects, labeled obj or iobj.
Oblique objects are labeled obl.
- Bare nominative and dative objects are considered core. Verbs that subcategorize for a single dative object are also considered core. Example: մոտենում էր քաղաքին “He was approaching the city.”
- All adpositional objects are considered oblique.
- Nominative objects of some verbs alternate with finite clausal complements, which are labeled ccomp.
- If a verb subcategorizes for the infinitive (e.g. modal verbs or verbs of control), the infinitival complement is labeled xcomp.
- If a verb subcategorizes for two core objects, one of them nominative (or
ccomp
) and the other non-nominative (mainly dative), then the non-nominative object is labeled iobj. Core nominal objects in other situations are labeled just obj.
- Adjuncts are usually adpositional phrases, but they can be bare noun phrases as well (following the Armenian grammar, adverbial modifiers are realized as noun phrases). They are labeled obl:
- Temporal or locational modifiers realized as noun phrases: կեսգիշերին եկավ “He came at midnight.”
- Dative noun phrases with benefactive or possessive role (i.e. if the verb does not subcategorize for a single dative object and if it is not a verb of giving (or similar), where the dative could be interpreted as the recipient. Example: նրան սուրճ եփեց “he made (for) him coffee.”
- Instrumental or directional noun phrases expressing the way or means with which something was done or direction from some point. Example: հաճույքով լսում էր “He was listening with pleasure.”
- All adpositional phrases (i.e., their role and form is not defined lexically by the predicate) are adjuncts.
- In passive clauses (both reflexive and reciprocal), the subject is labeled with nsubj:pass or csubj:pass, respectively.
- In causative clauses (both bare and periphrastic causative), the subject is labeled with nsubj:caus.
- The auxiliary verb in periphrastic causative is labeled aux:caus.
- The demoted agent of the action (if present) has the form of a bare dative and is labeled [iobj:agent].
Non-verbal Clauses
- The copula verb եմ (be) is used in equational, attributional, locative, possessive and benefactory nonverbal clauses.
- Purely existential clauses (without indicating location) normally use different lemmas, լինել or կամ, and they are treated as the head of the clause and tagged VERB.
Relations Overview
- The following relation subtypes are used in Armenian:
- acl:relcl for relative clauses
- advcl:relcl for relative clause modifiers of clauses
- advmod:emph for adverbs or particles that modify noun phrases and emphasize or negate them
- aux:caus for causative auxiliaries
- aux:ех for existentials as auxiliary
- case:loc for postpositional localizers
- compound:lvc for light compound verbs
- compound:redup for reduplicated compounds
- compound:svc for serial compound verbs
- csubj:pass for clausal subjects of passive verbs
- det:poss for possessive determiners
- iobj:agent for agentive indirect objects of causative verbs
- nmod:npmod for noun phrases
- nmod:poss for possessive modifiers
- nsubj:caus for nominal subjects of causative verbs
- nsubj:pass for nominal subjects of passive verbs
- obl:agent for agents of passive verbs
- The following relation types are not used in Armenian at all: clf
Treebanks
There are two Eastern Armenian UD treebanks and one treebank in Western Armenian: