Universal features
For core part-of-speech categories, see the universal POS tags. The features listed here distinguish additional lexical and grammatical properties of words, not covered by the POS tags.
Abbr
: abbreviation
Values: | Yes |
Boolean feature. Is this an abbreviation? Note that the abbreviated word(s) typically belongs to a part of speech other than u-pos/X.
Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1.
Yes
: it is abbreviation
Examples
- [en] etc., J., UK
AdpType
: adposition type
Values: | Circ | Post | Prep | Voc |
Prep
: preposition
Examples
- [en] in, on, to, from
Post
: postposition
Examples
- [de] entlang in der Strasse entlang “along the street”
Circ
: circumposition
Examples
- [de] von … an in von dieser Stelle an “from this place on”
Voc
: vocalized preposition
In Slavic languages, some prepositions are non-syllabic and their form has to be changed in some contexts to facilitate pronunciation.
Examples
- [cs] ke, ku, se, ve, ze
- [cs] k, k, s, v, z are the non-vocalized equivalents
Same phenomenon exists in Slovak, Russian and probably elsewhere.
AdvType
: adverb type
Semantic subclasses of adverbs. They are annotated in some tagsets (e.g. Bulgarian, Czech, Hindi, Japanese) and would probably apply to many other languages if their tagsets cared to cover them. Note that the “prontype” feature also applies to some adverbs and is orthogonal to “AdvType”.
Man
: adverb of manner
Examples
- [en] how, so
Loc
: adverb of location
Examples
- [en] where, here, there
- [cs] kde “where”, odkud “from where”, kudy “through where”, kam “where to”
Tim
: adverb of time
Examples
- [en] when, now, then
- [cs] kdy “when”, odkdy “since when”, dokdy “till when”
Deg
: adverb of quantity or degree
Note that there is a fuzzy borderline between adverbs of degree and indefinite numerals (as they are called in some grammars).
Examples
- [cs] více, méně “more, less”
Cau
: adverb of cause
Examples
- [en] why
Mod
: adverb of modal nature
The Czech examples below are similar to modal verbs: they take infinitives as arguments and add the meaning of possibility, necessity or recommendedness. I suspect that the Bulgarian example (transliteration of French “à propos”) is used differently but its native tagset also calles it “modal”.
Examples
- [bg] апропо
- [cs] možno “possible”, nutno “necessary”, radno “adviseable”, třeba “necessary”
Animacy
: animacy
Values: | Anim | Hum | Inan | Nhum |
Similarly to Gender (and to the African noun classes), animacy is usually a lexical feature of nouns and inflectional feature of other parts of speech (pronouns, adjectives, determiners, numerals, verbs) that mark agreement with nouns. Some languages distinguish only gender, some only animacy, and in some languages both gender and animacy play a role in the grammar. (Some non-UD tagsets then combine the two features into an extended system of genders; however, in UD the two features are annotated separately.)
Similarly to gender, the values of animacy refer to semantic properties of the noun, but this is only an approximation, referring to the prototypical members of the categroy. There are nouns that are treated as grammatically animate, although semantically the are inanimate.
The following table is an example of a three-way animacy distinction (human – animate nonhuman – inanimate) in the declension of the masculine determiner który “which” in Polish (boldface forms in the upper and lower rows differ from the middle row):
gender | sg-nom | sg-gen | sg-dat | sg-acc | sg-ins | sg-loc | pl-nom | pl-gen | pl-dat | pl-acc | pl-ins | pl-loc |
---|---|---|---|---|---|---|---|---|---|---|---|---|
animate human | który | którego | któremu | którego | którym | którym | którzy | których | którym | których | którymi | których |
animate non-human | który | którego | któremu | którego | którym | którym | które | których | którym | które | którymi | których |
inanimate | który | którego | któremu | który | którym | którym | które | których | którym | które | którymi | których |
In the corresponding paradigm of Czech, only two values are distinguished: masculine animate and masculine inanimate:
gender | sg-nom | sg-gen | sg-dat | sg-acc | sg-ins | sg-loc | pl-nom | pl-gen | pl-dat | pl-acc | pl-ins | pl-loc |
---|---|---|---|---|---|---|---|---|---|---|---|---|
animate | který | kterého | kterému | kterého | kterým | kterém | kteří | kterých | kterým | které | kterými | kterých |
inanimate | který | kterého | kterému | který | kterým | kterém | které | kterých | kterým | které | kterými | kterých |
More generally: Some languages distinguish animate vs. inanimate (e.g. Czech masculines), some languages distinguish human vs. non-human (e.g. Yuwan, a Ryukyuan language), and others distinguish three values, human vs. non-human animate vs. inanimate (e.g. Polish masculines).
Anim
: animate
Human beings, animals, fictional characters, names of professions etc. are normally animate. Even nouns that are normally inanimate can be inflected as animate if they are personified. And some words in some languages can grammatically behave like animates although there is no obvious semantic reason for that.
Examples
- [cs] malí kluci “small boys”
- [cs] malí psi “small dogs”
Inan
: inanimate
Nouns that are not animate are inanimate.
Examples
- [cs] malé domy “small houses”
- [pl] małe domy “small houses”
Hum
: human
A subset of animates where the prototypical member is a human being but not an animal. Again, there may be exceptions that do not fit the class semantically but belong to it grammatically.
Examples
- [pl] mali chłopcy “small boys”
Nhum
: non-human
In languages that only distinguish human from non-human, this value includes
inanimates. In languages that distinguish human animates, non-human animates
and inanimates, this value is used only for non-human animates, while Inan
is used for inanimates.
Examples
- [pl] małe psy “small dogs”
Aspect
: aspect
Values: | Hab | Imp | Iter | Perf | Prog | Prosp |
Aspect is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.
Aspect is a feature that specifies duration of the action in time, whether the action has been completed etc. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.
In Czech and other Slavic languages, aspect is a lexical feature. Pairs of imperfective and perfective verbs exist and are often morphologically related but the space is highly irregular and the verbs are considered to belong to separate lemmas.
Since we proceed bottom-up, the current standard covers only a few aspect values found in corpora. See Wikipedia (http://en.wikipedia.org/wiki/Grammatical_aspect) for a long list of other possible aspects.
Imp
: imperfect aspect
The action took / takes / will take some time span and there is no information whether and when it was / will be completed.
Examples
- [cs] péci “to bake” (Imp); pekl chleba “he baked / was baking a bread”
Perf
: perfect aspect
The action has been / will have been completed. Since there is emphasis on one point on the time scale (the point of completion), this aspect does not work well with the present tense. For example, Czech morphology can create present forms of perfective verbs but these actually have a future meaning.
Examples
- [cs] upéci “to bake” (Perf); upekl chleba “he baked / has baked a bread”
Prosp
: prospective aspect
In general, prospective aspect can be described as relative future: the action is/was/will be expected to take place at a moment that follows the reference point; the reference point itself can be in past, present or future. In the English sentence When I got home yesterday, John called and said he would arrive soon, the last clause (he would arrive soon) is in prospective aspect. Nevertheless, English does not have overt affixal morphemes dedicated to the prospective aspect, and we do not need the label in English. But other languages do; the -ko suffix in Basque is an example.
Note that this value was called Pro
in UD v1 and it has been renamed Prosp
in UD v2.
Examples
- [eu] Liburua irakurriko behar du. lit. book-a read-Prosp must AUX “He must go to read a book.”
Prog
: progressive aspect
English progressive tenses (I am eating, I have been doing …) have this aspect. They are constructed analytically (auxiliary + present participle) but the -ing participle is so bound to progressive meaning that it seems a good idea to annotate it with this feature (we have to distinguish it from the past participle somehow; we may use both the “Tense” and the “Aspect” features).
In languages other than English, the progressive meaning may be expressed by morphemes bound to the main verb, which makes this value even more justified. Example is Turkish with its two distinct progressive morphemes, -yor and -mekte.
Examples
- [tr] eve gidiyor “she is going home (now)”
- [tr] eve gitmekte “she is going home (now)”
- [tr] eve gidiyordu “she was going home (when I saw her)”
- [tr] eve gimekteydi “she was going home (when I saw her)”
Hab
: habitual aspect
The action takes place habitually (daily, weekly, annually etc) or is a usual occurrence.
Examples
- [ga] Bíonn an seoladh poist céanna ag na vótóirí uilig “Each voter (usually) has the same postal address”
Iter
: iterative / frequentative aspect
Denotes repeated action. Attested e.g. in Hungarian.
Iteratives also exist in Czech with this name but their meaning is rather habitual.
They can be formed
only from imperfective verbs and they are usually not classified as a separate
aspect; they are just Aspect=Imp.
Note: This value is new in UD v2 but a similar value has been used in UD v1
as language-specific for Hungarian, though it was called frequentative there
(Freq
).
Examples
- [hu] üt “hit”, ütöget “hit several times”
Case
: case
Values: | Core: | Abs | Acc | Erg | Nom | ||||||||||||||
Non-core: | Abe | Ben | Cau | Cmp | Cns | Com | Dat | Dis | Equ | Gen | Ins | Par | Tem | Tra | Voc | ||||
Local: | Abl | Add | Ade | All | Del | Ela | Ess | Ill | Ine | Lat | Loc | Per | Sbe | Sbl | Spl | Sub | Sup | Ter |
Case
is usually an inflectional feature of nouns and,
depending on language, other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with nouns.
Case
can also be a lexical feature of adpositions and
describe the case meaning that the adposition contributes to the nominal
in which it appears. (This usage of the feature is typical for languages
that do not have case morphology on nouns. For languages that have both
adpositions and morphological case, the traditional set of cases is
determined by the nominal forms and it does not cover adpositional
meanings.) In some non-UD tagsets, case of adpositions is
used as a valency feature (saying that the adposition requires its
nominal argument to be in that morphological case); however, annotating
adposition valency case in UD treebanks would be superfluous because the
same case feature can be found at the nominal to which the adposition
belongs.
Case helps specify the role of the noun phrase in the sentence, especially in free-word-order languages. For example, the nominative and accusative cases often distinguish subject and object of the verb, while in fixed-word-order languages these functions would be distinguished merely by the positions of the nouns in the sentence.
Here on the level of morphosyntactic features we are dealing with case expressed morphologically, i.e. by bound morphemes (affixes). Note that on a higher level case can be understood more broadly as the role, and it can be also expressed by adding an adposition to the noun. What is expressed by affixes in one language can be expressed using adpositions in another language. Cf. the u-dep/case dependency label.
Examples
- [cs] nominative matka “mother”, genitive matky, dative matce, accusative matku, vocative matko, locative matce, instrumental matkou
- [de] nominative der Mann “the man”, genitive des Mannes, dative dem Mann, accusative den Mann
- [en] nominative/direct case he, she, accusative/oblique case him, her.
The descriptions of the individual case values below include semantic hints about the prototypical meaning of the case. Bear in mind that quite often a case will be used for a meaning that is totally unrelated to the meaning mentioned here. Valency of verbs, adpositions and other words will determine that the noun phrase must be in a particular grammatical case to fill a particular valency slot (semantic role). It is much the same as trying to explain the meaning of prepositions: most people would agree that the central meaning of English in is location in space or time but there are phrases where the meaning is less locational: In God we trust. Say it in English.
Note that Indian corpora based on the so-called Paninian model use a
related feature called vibhakti. It is a merger of the Case feature
described here and of various postpositions. Values of the feature are
language-dependent because they are copies of the relevant morphemes
(either bound morphemes or postpositions). Vibhakti can be mapped on
the Case values described here if we know 1. which source values are
bound morphemes (postpositions are separate nodes for us) and 2. what
is their meaning. For instance, the genitive case (Gen
) in Bengali
is marked using the suffix -ra (-র), i.e. vib=era. In Hindi, the
suffix has been split off the noun and it is now written as a separate
word – the postposition kā/kī/ke (का/की/के). Even if the
postpositional phrase can be understood as a genitive noun phrase, the
noun is not in genitive. Instead, the postposition requires that it
takes one of three case forms that are marked directly on the noun:
the oblique case (Acc
).
Nom
: nominative / direct
The base form of the noun, typically used as citation form (lemma). In many languages this is the word form used for subjects of clauses. If the language has only two cases, which are called “direct” and “oblique”, the direct case will be marked Nom.
Examples
- [en] She sleeps.
- [en] He loves her.
- [cs] Jana spí. “Jana sleeps.”
- [cs] Pavel miluje Janu. “Pavel loves Jana.”
Acc
: accusative / oblique
Perhaps the second most widely spread morphological case. In many languages this is the word form used for direct objects of verbs. If the language has only two cases, which are called “direct” and “oblique”, the oblique case will be marked Acc.
Examples
- [en] He loves her.
- [cs] Pavel miluje Janu. “Pavel loves Jana.”
Abs
: absolutive
Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.
The absolutive case marks subject of intransitive verb and direct object of transitive verb.
Examples
- [eu] Maria lotan dago. “Maria is sleeping.”
Erg
: ergative
Some languages (e.g. Basque) do not use nominative-accusative to distinguish subjects and objects. Instead, they use the contrast of absolutive-ergative.
The ergative case marks subject of transitive verb.
Examples
- [eu] Juanek Maria maite du. “Juan loves Maria.”
Dat
: dative
In many languages this is the word form used for indirect objects of verbs.
Examples
- [de] Ich gebe meinem Bruder ein Geschenk. “I give my brother a present.” (meinem Bruder “my brother” is dative and ein Geschenk “a present” is accusative.)
Gen
: genitive
Prototypical meaning of genitive is that the noun phrase somehow belongs to its governor; it would often be translated by the English preposition of. English has the “saxon genitive” formed by the suffix ‘s; but we will normally not need the feature in English because the suffix gets separated from the noun during tokenization.
Note that despite considerable semantic overlap, the genitive case is not the same as the feature of possessivity (Poss). Possessivity is a lexical feature, i.e. it applies to lemma and its whole paradigm. Genitive is a feature of just a subset of word forms of the lemma. Semantics of possessivity is much more clearly defined while the genitive (as many other cases) may be required in situations that have nothing to do with possessing. For example, [cs] bez prezidentovy dcery “without the president’s daughter” is a prepositional phrase containing the preposition bez “without”, the possessive adjective prezidentovy “president’s” and the noun dcery “daughter”. The possessive adjective is derived from the noun prezident but it is really an adjective (with separate lemma and paradigm), not just a form of the noun. In addition, both the adjective and the noun are in their genitive forms (the nominative would be prezidentova dcera). There is nothing possessive about this particular occurrence of the genitive. It is there because the preposition bez always requires its argument to be in genitive.
Examples
- [cs] Praha je hlavní město České republiky. “Prague is the capital of the Czech Republic.”
Note that in Basque, Gen should be used for possessive genitive (as opposed to locative genitive): diktadorearen erregimena “dictator’s regime”; diktadore “dictator”.
Voc
: vocative
The vocative case is a special form of noun used to address someone. Thus it predominantly appears with animate nouns (see the feature of Animacy). Nevertheless this is not a grammatical restriction and inanimate things can be addressed as well.
Examples
- [cs] Co myslíš, Filipe? “What do you think, Filip?”
Ins
: instrumental / instructive
The role from which the name of the instrumental case is derived is that the noun is used as instrument to do something (as in [cs] psát perem “to write using a pen”). Many other meanings are possible, e.g. in Czech the instrumental is required by the preposition s “with” and thus it includes the meaning expressed in other languages by the comitative case.
In Czech the instrumental is also used for the agent-object in passive constructions (cf. the English preposition by).
Examples
- [cs] Tento zákon byl schválen vládou. “This bill has been approved by the government.” (Passive example)
A semantically similar case called instructive is used rarely in Finnish to express “with (the aid of)”. It can be applied to infinitives that behave much like nouns in Finnish. We propose one label for both instrumental and instructive (instrumental is not defined in Finnish).
Examples
- [fi] lähteä “to leave”; 2003 lähtien “since 2003” (second infinitive in the instructive case)
- [fi] yllättää “to surprise”; sekaantui yllättäen valtataisteluun lit. was-involved-in by-surprise.Ins power-struggle.Ill.
Par
: partitive
In Finnish the partitive case expresses indefinite identity and unfinished actions without result.
Examples
- [fi] kolme taloa “three houses”; (the -a suffix of talo)
- [fi] rakastan tätä taloa “I love this house”
- [fi] saanko lainata kirjaa? “can I borrow the book?” (the -a suffix of kirja)
- [fi]lasissa on maitoa “there is (some) milk in the glass”
Examples comparing partitive with accusative: ammuin karhun “I shot a bear.Acc” (and I know that it is dead); ammuin karhua “I shot at a bear.Par” (but I may have missed).
Using accusative instead of partitive may also substitute the missing future tense: luen kirjan “I will read the book.Acc”; luen kirjaa “I am reading the book.Par”.
Dis
: distributive
The distributive case conveys that something happened to every member of a set, one in a time. Or it may express frequency.
Examples
- [hu] fejenként “per capita”
- [hu] esetenként “in some cases”
- [hu] hetenként “once per week, weekly”
- [hu] tízpercenként “every ten minutes”
Ess
: essive / prolative
The essive case expresses a temporary state, often it corresponds to
English “as a …” A similar case in Basque is called prolative
and it should be tagged Ess
too.
Examples
- [fi] lapsi “child”; lapsena “as a child / when he/she was child”
- [et] laps “child”; lapsena “as a child”
- [eu] erreformista “reformer”; erreformistatzat “as a reformer”
Tra
: translative / factive
The translative case expresses a change of state (“it becomes X”, “it changes to X”). Also used for the phrase “in language X”. In the Szeged Treebank, this case is called factive.
Examples
- [fi] pitkä “long”; kasvoi pitkäksi “grew long”
- [fi] englanti “English language”; englanniksi “in/into English”
- [fi] kello kuusi “six o’clock”; kello kuudeksi “by six o’clock”
- [et] kell kuus “six o’clock”; kella kuueks “by six o’clock”
- [hu] Oroszlány halott várossá válhat. lit. Oroszlány dead city.Tra could-become. “Oroszlány could become a dead city.”
Com
: comitative / associative
The comitative (also called associative) case corresponds to English “together with …”
Examples
- [et] koer “dog”; koeraga “with dog”
Abe
: abessive / caritive / privative
The abessive case (also called caritive or privative) corresponds to the English preposition without.
Examples
- [fi] raha “money”; rahatta “without money”
- [kpv] сьӧмтӧг “without money”
Cau
: causative / motivative / purposive
Noun in this case is the cause or purpose of something. In Hungarian it also seems to be used frequently with currency (“to buy something for the money”) and it also can mean the goal of something.
Examples
- [hu] Egy világcég benzinkútjánál 7183 forintért tankoltam. lit. a world-wide.company petrol.station.Ade 7183 forint.Cau refueled “I refueled my car at the petrol station of a world-wide company for 7183 forints.”
- [hu] Elmentem a boltba tejért. lit. went the shop.Ill milk.Cau “I went to the shop to buy milk.”
- [eu] jokaera “behavior”; jokaeragatik “because of behavior”
Ben
: benefactive / destinative
The benefactive case corresponds to the English preposition for.
Examples
- [eu] mutil “boy”; mutilarentzat “for boys”
Cns
: considerative
The considerative case denotes something that is given in exchange for something else. It is used in Warlpiri (Andrews 2007, p.164).
Examples
- [wbp] miyi “food”; miyiwanawana “for food” (Japanangkarlu kaju karli yinyi miyiwanawana “Japanangka is giving me a boomerang in exchange for food”)
Cmp
: comparative
The comparative case means “than X”. It marks the standard of comparison and it differs from the comparative Degree, which marks the property being compared. It occurs in Dravidian and Northeast-Caucasian languages.
Examples
- [mr] हे फूल त्या फुलापेक्षा सुंदर आहे. (Hē phūla tyā phulāpēkṣā sundara āhē.) “This flower is more beautiful than that flower.”
Equ
: equative
The equative case means “X-like”, “similar to X”, “same as X”. It marks the standard of comparison and it differs from the equative Degree, which marks the property being compared. It occurs in Turkish.
Examples
- [tr] ben “I”; bence “like me”
Location and direction
Loc
: locative
The locative case often expresses location in space or time, which gave it its name. As elsewhere, non-locational meanings also exist and they are not rare. Uralic languages have a complex set of fine-grained locational and directional cases (see below) instead of the locative. Even in languages that have locative, some location roles may be expressed using other cases (e.g. because those cases are required by a preposition).
In Slavic languages this is the only case that is used exclusively in combination with prepositions (but such a restriction may not hold in other languages that have locative).
Examples
- [cs] V červenci jsem byl ve Švédsku. “In July I was in Sweden.”
- [cs] Mluvili jsme tam o morfologii. “We talked there about morphology.” (Non-locational non-temporal example)
Lat
: lative / directional allative
The lative case denotes movement towards/to/into/onto something. Similar case in Basque is called directional allative (Spanish adlativo direccional). However, lative is typically thought of as a union of allative, illative and sublative, while in Basque it is derived from allative, which also exists independently.
Examples
- [eu] etxerantz “toward house/home”
- [eu] behe “low”; beherantz “down”
Ibarretxe-Antuñano (2004: 282) says about directional and terminal allative in Basque: “What crucially distinguishes these two cases from the allative is that, on top of profiling the goal, they also profile the path, or to be more precise, some of the components of the path.”
Ter
: terminative / terminal allative
The terminative case specifies where something ends in space or time. Similar case in Basque is called terminal allative (Spanish adlativo terminal). While the lative (or directional allative) specifies only the general direction, the terminative (terminal allative) also says that the destination is reached.
Examples
- [et] jõeni “down to the river”; kella kuueni “till six o’clock”
- [hu] a házig “up to the house”; hat óráig “till six o’clock”
- [eu] etxeraino “up to the house”; erdi “half”; erdiraino “up to the half”
Internal location
Ine
: inessive
The inessive case expresses location inside of something.
Examples
- [hu] ház “house”; házban “in the house”
- [fi] talo “house”; talossa “in the house”
- [et] maja “house”; majas “in the house”
Ill
: illative / inlative
The illative case expresses direction into something.
Examples
- [hu] ház “house”; házba “into the house”
- [fi] talo “house”; taloon “into the house”
- [et] maja “house”; majasse “into the house”
Ela
: elative / inelative
The elative case expresses direction out of something.
Examples
- [hu] ház “house”; házból “from the house”
- [fi] talo “house”; talosta “from the house”
- [et] maja “house”; majast “from the house”
Add
: additive
Distinguished by some scholars in Estonian, not recognized by traditional grammar, exists in the Multext-East Estonian tagset and in the Eesti keele puudepank. It has the meaning of illative, and some grammars will thus consider the additive just an alternative form of illative. Forms of this case exist only in singular and not for all nouns.
Examples
- [et] riik “government”; riigisse “to the government” (singular illative); riiki “to the government” (singular additive)
External location
Ade
: adessive
The adessive case expresses location at, on the surface, or near something. The corresponding directional cases are allative (towards something) and ablative (from something).
Examples
- [hu] pénztár “cash desk”; pénztárnál “at the cash desk”
- [fi] pöytä “table”; pöydällä “on the table”
- [et] laud “table”; laual “on the table”
Note that adessive is used to express location on the surface of something in Finnish and Estonian, but does not carry this meaning in Hungarian.
All
: allative / adlative
The allative case expresses direction to something (destination is adessive, i.e. at or on that something).
Examples
- [hu] pénztár “cash desk”; pénztárhoz “to the cash desk”
- [fi] pöytä “table”; pöydälle “onto the table”
Abl
: ablative / adelative
Prototypical meaning: direction from some point. In systems that distinguish different source locatins (e.g. in Uralic languages), this case corresponds to the “adelative”, that is, the source is adessive.
Examples
- [hu] a barátomtól jövök “I’m coming from my friend”
- [fi] pöydältä “from the table”; katolta “from the roof”; rannalta “from the beach”
Higher location
Sup
: superessive
Used to express location higher than a reference point (atop something or above something). Attested in Nakh-Dagestanian languages and also in Hungarian (while other Uralic languages express this location with the adessive case, Hungarian has both adessive and superessive).
Examples
- [hu] asztal “table”; asztalon “on the table”
- [hu] könyvek “books”; könyveken “on books”
- [dar] ustuj “table”; ustujčeb “on the table”
- [lez] векьел (veq’el) “on grass”
Spl
: superlative
The superlative case is used in Nakh-Dagestanian languages to express the destination of movement, originally to the top of something, and, by extension, in other figurative meanings as well.
Note that Hungarian assigns this meaning to the sublative case, which otherwise indicates that the destination is below (not above) something.
Examples
- [dar] ʁarʁa “stone”; ʁarʁaliče “onto the stone”
- [lez] вичелди (vičeldi) “onto himself”
Del
: delative / superelative
Used in Hungarian and in Nakh-Dagestanian languages to express the movement from the surface of something (like “moved off the table”).
Other meanings are possible as well, e.g. “about something”.
Examples
- [hu] asztal “table”; az asztalról “off the table”
- [hu]Budapestről jövök “I am coming from Budapest”
- [dar] bahičela “from (on) the wall”
- [lez] балкIандилай (balk’andilaj) “off the horse”
Lower location
Sub
: subessive
Used to express location lower than a reference point (under something or below something). Attested in Nakh-Dagestanian languages.
Examples
- [lez] тарцин сериндик (tarcin serindik) “under the shade of a tree”
Sbl
: sublative
The original meaning of the sublative case is movement towards a place under or lower than something, that is, the destination is subessive. It is attested in Nakh-Dagestanian languages. Note however that like many other cases, it is now used in abstract senses that are not apparently connected to the spatial meaning: for example, in Lezgian it may indicate the cause of something.
Hungarian uses the sublative label for what would be better categorized as superlative, as it expresses the movement to the surface of something (e.g. “to climb a tree”), and, by extension, other figurative meanings as well (e.g. “to university”).
Examples
- [hu] Belgrádtól 150 kilométerre délnyugatra lit. Belgrade.Abl 150 kilometer.Sbl southwest.Sbl “150 kilometers southwest of Belgrade”
- [hu] hajó “ship”; hajóra “onto the ship”
- [hu] bokorra “on the shrub”
- [lez] Жанавур кашакди гиликьна. (Žanavur kašakdi giliq’na.) “The wolf died of hunger.”
Sbe
: subelative
Used to express movement or direction from under something.
Examples
- [lez] Палту михиникай куьрсарнава. (Paltu mixinikaj kyrsarnava.) “The coat hangs from the nail.”
Per
: perlative
The perlative case denotes movement along something. It is used in Warlpiri (Andrews 2007, p.162). Note that Unimorph mentions the English preposition “along” in connection with what they call prolative/translative; but we have different definitions of those two cases.
Examples
- [wbp] yurutu “road”; yurutuwana “along the road” (Pirli kalujana yurutuwana yirrarni “They are putting stones along the road”)
Tem
: temporal
The temporal case is used to indicate time.
Examples
- [hu] hétkor “at seven (o’clock)”; éjfélkor “at midnight”; karácsonykor “at Christmas”
References
- Avery D. Andrews: The major functions of the noun phrase. In: Timothy Shopen (ed.) (2007): Language Typology and Syntactic Description, Volume I: Clause Structure. Second Edition. Cambridge University Press. ISBN 978-0-521-58156-1.
- Iraide Ibarretxe-Antuñano (2004): “Polysemy in Basque locational cases”. Belgian Journal of Linguistics 18: 271–298.
Clusivity
: clusivity
Values: | Ex | In |
Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Plains Cree (Wolvengrey 2011 p. 66).
In
: inclusive
Includes the listener, i.e. we = I + you (+ optionally they).
Examples
- [id] kita “we”
- [crk] kiwīcihānaw “we (I+you) help him”
Ex
: exclusive
Excludes the listener, i.e. we = I + they.
Examples
- [id] kami “we”
- [crk] niwīcihānān “we (I+they) help him”
References
- Arok Elessar Wolvengrey. 2011. Semantic and pragmatic functions in Plains Cree syntax (PhD thesis). LOT, Utrecht, Netherlands. ISBN 978-94-6093-051-5.
Clusivity[obj]
: clusivity agreement with object
Values: | Ex | In |
Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Mbyá Guaraní.
Some languages are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Clusivity
of the argument,
we have two layers of Clusivity
on the verb: Clusivity[subj]
, and (for transitive verbs) Clusivity[obj]
.
While it would be possible to make the subject layer the default and use just Clusivity
for it,
the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.
In
: inclusive object
Includes the listener, i.e. we = I + you (+ optionally they).
Examples
- [gun] Ñande, ñanderayvu ra’e, añete’i po ra’e, chejaryi. “She truly loves us (me+you), my grandmother.” (lit. 1.PL.INCL, B1.PL.INCL-R-love MIR, truth=DIM EPIS MIR, B1.SG-grandmother)
Ex
: exclusive object
Excludes the listener, i.e. we = I + they.
Examples
- [gun] Ore upecha orejaryi orereroayvu. “Our grandmothers advised us (me+them) like this.” (lit. 1.PL.EXCL like.this B1.PL.EXCL-grandmother B1.PL.EXCL-R-COM-speak)
Clusivity[psor]
: possessor’s clusivity
Values: | Ex | In |
Clusivity is a feature of first-person plural personal pronouns. Clusivity[psor] is possessor’s clusivity, marked e.g. on nouns in Mbyá Guaraní. These noun forms would be translated to English as possessive pronoun + noun.
This layered feature is conveniently used for possessive inflections
of nouns, although nouns normally do not have a Clusivity
feature,
meaning that no other layers are needed. Nevertheless, the possessive
morphology typically also includes Number
, which could be confused
with the number of the noun, and we thus have Person[psor]
together with Number[psor]
.
This layered feature is normally not used with possessive pronouns.
They traditionally have just simple Clusivity
.
(And in some languages, possessive pronouns are actually identical to
personal pronouns in the genitive case.)
In
: inclusive possessor
Includes the listener, i.e. we = I + you (+ optionally they).
Examples
- [gun] ñandejaryi “our (my+your) grandmother” (lit. B1.PL.INCL-grandmother)
Ex
: exclusive possessor
Excludes the listener, i.e. we = I + they.
Examples
- [gun] orejaryi “our (my+their) grandmother” (lit. B1.PL.EXCL-grandmother)
Clusivity[subj]
: clusivity agreement with subject
Values: | Ex | In |
Clusivity is a feature of first-person plural personal pronouns. As such, it can also be reflected by inflection of verbs, e.g. in Mbyá Guaraní.
Some languages are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Clusivity
of the argument,
we have two layers of Clusivity
on the verb: Clusivity[subj]
, and (for transitive verbs) Clusivity[obj]
.
While it would be possible to make the subject layer the default and use just Clusivity
for it,
the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.
In
: inclusive subject
Includes the listener, i.e. we = I + you (+ optionally they).
Examples
- [gun] Mba’echa pa ñande jaiko? “How do we (I+you) live?” (lit. how Q 1.PL.INCL A1.PL.INCL-live)
Ex
: exclusive subject
Excludes the listener, i.e. we = I + they.
Examples
- [gun] Upei roiko upeicha. “Then we (I+they) lived like this.” (lit. afterwards A1.PL.EXCL-live like.this)
ConjType
: conjunction type
Values: | Comp | Oper | Pred |
We already distinguished the two main types, coordinating and subordinating conjunctions, at the level of POS tags. However, there are other subtypes that are not yet accounted for.
Comp
: comparing conjunction
Examples: [de] wie (as), als (than)
Oper
: mathematical operator
Note that operators can be expressed either using symbols or using words.
Examples: [cs] krát (times), plus, minus
Pred
: subordinating conjunction introducing a secondary predicate
Examples: [pl] jako (as)
Definite
: definiteness or state
Values: | Com | Cons | Def | Ind | Spec |
Definiteness is typically a feature of nouns, adjectives and articles. Its value distinguishes whether we are talking about something known and concrete, or something general or unknown. It can be marked on definite and indefinite articles, or directly on nouns, adjectives etc. In Arabic, definiteness is also called the “state”.
Ind
: indefinite
In languages where Spec
is distinguished the value Ind
is interpreted as non-specific
indefinite, i.e. “any (one) stick”.
Examples
- [en] a dog
- [sv] en hund “a dog”
- [lkt] c’ą wążi ‘aų wo “put a [any] stick on [the fire]”
Spec
: specific indefinite
Specific indefinite, e.g. “a certain stick”.
Occurs e.g. in Lakota.
In languages where it is used the value Ind
is interpreted as non-specific
indefinite, i.e. “any (one) stick”.
Examples
- [lkt] c’ą wą ‘ag.li’ “he brought a [certain] stick”
Def
: definite
Examples
- [en] the dog
- [sv] hunden “the dog”
- [lkt] c’ą kį “the stick”
Cons
: construct state / reduced definiteness
Used in construct state in Arabic. If two nouns are in genitive relation, the first one (the “nomen regens”) has “reduced definiteness,” the second is the genitive and can be either definite or indefinite. Reduced form has neither the definite morpheme (article), nor the indefinite morpheme (nunation).
Note that in UD v1 this value was called Red
. It has been renamed Cons
in
UD v2.
Examples
- [ar] indefinite state: حلوَةٌ ḥulwatun “a sweet”; definite state: الحلوَةُ al-ḥulwatu “the sweet”; construct state: حلوَةُ ḥulwatu “sweet of”.
Com
: complex
Used in improper annexation in Arabic. The genitive construction described above normally consists of two nouns (first reduced, second genitive). That is called proper annexation or iḍāfa. If the first member is an adjective or adjectivally used participle and the second member is a definite noun, the construction is called improper annexation or false iḍāfa. The result is a compound adjective that is usually used as an attributive adjunct and thus must agree in definiteness with the noun it modifies. Its first part (the adjective or participle) may get again the definite article. Although it may look the same as the form for the definite state, it is assigned a special value of complex state to reflect the different origin. See also Hajič et al. page 3.
Examples:
- [ar] مُخْتَلِفٌ muxtalifun “different/various” (active participle, Form VIII); نَوْعٌ ج أنْوَاعٌ nawˀun ja anwāˀun “kind”; مُخْتَلِفُ الأنْوَاعِ muxtalifu al-anwāˀi “of various kinds” (false iḍāfa); مَشَاكِلُ مُخْتَلِفَةُ الأنْوَاعِ mašākilu muxtalifatu al-anwāˀi “problems of various kinds”; اَلْمَشَاكِلُ الْمُخْتَلِفَةُ الأنْوَاعِ al-mašākilu al-muxtalifatu al-anwāˀi “the problems of various kinds”.
Degree
: degree
Values: | Abs | Aug | Cmp | Dim | Equ | Pos | Sup |
Degree of comparison is typically an inflectional feature of some adjectives and adverbs. A different flavor of degree is diminutives and augmentatives, which often apply to nouns but are not restricted to them.
Pos
: positive, first degree
This is the base form that merely states a quality of something, without comparing it to qualities of others. Note that although this degree is traditionally called “positive”, negative properties can be compared, too.
Examples
- [en] young man
- [cs] mladý muž
Equ
: equative
The quality of one object is compared to the same quality of another object, and the result is that they are identical or similar (“as X as”). Note that it marks the adjective and it is distinct from the equative Case, which marks the standard of comparison.
Examples
- [et] pikkune (pikkus+ne) “as tall as”
Cmp
: comparative, second degree
The quality of one object is compared to the same quality of another object.
Examples
- [en] the man is younger than me
- [cs] ten muž je mladší než já
Sup
: superlative, third degree
The quality of one object is compared to the same quality of all other objects within a set.
Examples
- [en] this is the youngest man in our team
- [cs] toto je nejmladší muž v našem týmu
Abs
: absolute superlative
Some languages can express morphologically that the studied quality of the given object is so strong that there is hardly any other object exceeding it. The quality is not actually compared to any particular set of objects.
Examples
- [es] guapo “handsome”; guapísimo “indescribably handsome”
Dim
: diminutive
Morphologically derived form of a noun that indicates small size, or, metaphorically, affection towards the entity described by the noun. While nouns are the prototypical category in which diminutives are formed, the feature is not restricted to nouns and in some languages similar morphology can be observed with other categories (adjectives, verbs).
Examples
- [cs] člověk “man”; človíček “little man”
- [nl] appel “apple”; appeltje “little apple”
Aug
: augmentative
Morphologically derived form of a noun that indicates large size or force. While nouns are the prototypical category in which augmentatives are formed, the feature is not restricted to nouns and in some languages similar morphology can be observed with other categories (adjectives, verbs).
Examples
- [cs] chlap “guy”; chlapák “big guy, macho”
- [pt] apartamento “apartment”; apartamentão “big apartment”
Deixis
: relative location encoded in demonstratives
Values: | Abv | Bel | Even | Med | Nvis | Prox | Remt |
Deixis is typically a feature of demonstrative pronouns, determiners, and adverbs. Its value classifies the location of the referred entity with respect to the location of the speaker or of the hearer. The common distinction is distance (proximate vs. remote entities); in some languages, elevation is distinguished as well (e.g., the entity is located higher or lower than the speaker).
If it is necessary to distinguish the person whose location is the reference point (speaker or hearer),
the feature DeixisRef can be used in addition to Deixis
. See also the Wolof examples below.
DeixisRef
is not needed if all deictic expressions in the language are relative to the same person
(probably the speaker).
Prox
: proximate
The entity is close to the reference point (e.g., to the speaker).
Examples
- [en] this dog
- [en] here
- [es] aquí “here”
- [eu] hau “he/she (nearby)”
- [wo] xaj bii “this dog” (close to me, wherever you may be)
Deixis=Prox|DeixisRef=1
- [wo] xaj boobu “that dog / the dog in question” (close to you, far from me)
Deixis=Prox|DeixisRef=2
- [kha] u-ne “he (near)”
Med
: medial
The entity is neither close nor far away from the reference point (e.g., from the speaker).
Examples
- [es] ahí “there”
- [eu] hori “he/she (not close)”
- [wo] xaj boobale “that dog” (far away from both of us, but closer to you than to me)
Deixis=Med|DeixisRef=2
- [kha] u-to “he (not near, not far)”
Remt
: remote, distal
The entity is far away from the reference point (e.g., from the speaker).
Examples
- [en] that dog
- [en] there
- [es] allí “there”
- [eu] hura “he/she (over there, yonder)”
- [wo] xaj bale “that dog” (far away from me, wherever you may be)
Deixis=Remt|DeixisRef=1
- [kha] u-tay “he (far away, visible)”
Nvis
: not visible
The entity is remote and not visible. In Khasi, where this distinction is made, the Remt
value
can be used to annotate “remote but visible”.
Examples
- [kha] u-to “he (far away, not visible)”
Abv
: above the reference point
Occurs e.g. in Aghul [agx], Lak [lbe], and Khasi [kha]. The entity is both remote from the speaker and above them.
Examples
- [agx] te “that” (remote, elevation-neutral)
- [agx] le “that (above)”
- [lbe] k’a “that (above speaker)”
- [kha] u-tey “he (above)”
Even
: at the same level as the reference point
Occurs e.g. in Lak [lbe]. The entity is both remote and at the same level as the speaker.
Examples
- [lbe] ga “that” (Elevation neutral in current usage. In older usage, this pronoun pointed below the reference point.)
- [lbe] ta “that (same level)” (In older usage, ta was the pronoun unmarked for elevation, but in current usage it denotes the same level as the reference point.)
Bel
: below the reference point
Occurs e.g. in Aghul [agx] and Khasi [kha]. The entity is both remote from the speaker and below them.
Examples
- [agx] te “that” (remote, elevation-neutral)
- [agx] ge “that (below)”
- [kha] u-thie “he (below)”
DeixisRef
: person to which deixis is relative
Values: | 1 | 2 |
DeixisRef is a feature of demonstrative pronouns, determiners,
and adverbs, accompanying Deixis when necessary. Deixis
encodes position of
an entity relative to either the speaker or the hearer. If it is necessary to distinguish
the person whose location is the reference point (speaker or hearer), DeixisRef
is used.
DeixisRef
is not needed if all deictic expressions in the language are relative to the same
person (probably the speaker), or if they do not distinguish the reference point.
1
: deixis relative to the first person participant (speaker)
Examples
- [wo] xaj bii “this dog” (close to me, wherever you may be)
Deixis=Prox|DeixisRef=1
- [wo] xaj bale “that dog” (far away from me, wherever you may be)
Deixis=Remt|DeixisRef=1
2
: deixis relative to the second person participant (hearer)
Examples
- [wo] xaj boobu “that dog / the dog in question” (close to you, far from me)
Deixis=Prox|DeixisRef=2
- [wo] xaj boobale “that dog” (far away from both of us, but closer to you than to me)
Deixis=Med|DeixisRef=2
Echo
: is this an echo word or a reduplicative?
Is this a reduplicative or echo word? Such words occur in Hindi and other Indian languages. In Hyderabad Dependency Treebank they get their own part-of-speech tags RDP and ECH, respectively. We do not want to treat them as separate parts of speech because they could be assigned a POS independent of their RDP or ECH status (same as the word that they echo). Perhaps we should merge this also with the “hyph” feature to something called “compound”?
Rdp
: reduplicative
The word is a copy of a previous word. In Hindi, this would add the meaning of distribution (“one rupee each”), separation (“sit separately”), variety, diversity or just emphasis.
Examples: [hi] “कभी - कभी” = “kabhī - kabhī” = “sometimes”, “कभी” = “kabhī” = “sometimes”; “एक एक” = “eka eka” = “one each”, “एक” = “eka” = “one”
Ech
: echo
The word rhymes with a previous word but it is not identical to it and typically it does not have any meaning of its own. In Hindi it generalizes the meaning of the previous word and eventually translates as “or something”, “etc.” etc.
Examples: [hi] “चाय वाय” = “čāya vāya” = “tea or something” (as in “Have some tea or something.”)
For more details see Rupert Snell and Simon Weightman: Teach Yourself Hindi, Section 16.4 and 16.5, pages 210 – 211.
Evident
: evidentiality
Values: | Fh | Nfh |
Evidentiality is the morphological marking of a speaker’s source of information (Aikhenvald, 2004). It is sometimes viewed as a category of mood and modality.
Many different values are attested in the world’s languages. At present we only cover the firsthand vs. non-firsthand distinction, needed in Turkish. It distinguishes there the normal past tense (firsthand, also definite past tense, seen past tense) from the so-called miş-past (non-firsthand, renarrative, indefinite, heard past tense).
Aikhenvald also distinguishes reported evidentiality, occurring in Estonian and Latvian, among others. We currently use the quotative Mood for this.
Note: Evident
is a new universal feature in UD version 2. It was used as
a language-specific feature (under the name Evidentiality
) in UD v1 for Turkish.
Fh
: firsthand
Examples
- [tr] geldi “he/she/it came” (and I was there and saw them coming)
Nfh
: non-firsthand
Examples
- [tr] gelmiş “he/she/it has come” (I did not witness them coming but I know it because someone told me / because I see that they are there now)
References
- Aikhenvald, Alexandra Y. 2004. Evidentiality. Oxford: Oxford University Press.
ExtPos
: external part of speech
Values: | ADJ | ADP | ADV | CCONJ | DET | INTJ | PRON | PROPN | SCONJ |
This feature differs significantly from all other features: It describes neither the lexical category,
nor the inflectional paradigm slot of the token it appears on. Rather than to the individual token,
it pertains to a multiword expression and indicates the part of speech that the expression would get
if it were analyzed as a single word. ExtPos
is annotated at the head node of the multiword
expression. The possible values are taken from the defined UPOS tags and no other
values are allowed (not even at the language-specific level). The main motivation for ExtPos
is that
the multiword expression may behave like a part of speech different from the UPOS of the head node;
however, ExtPos
is sometimes used even if it is identical to the UPOS of the head node. Also, it is
not strictly necessary that the expression is multiword – if one of the words of the expression is
omitted by mistake, or if a single word has been coerced into a part of speech different from its
lexical one, ExtPos
may be used to signal it.
ExtPos
is strongly recommended for fixed functional multiword expressions (the head node has one
or more children attached via the fixed relation). These should normally lead to ExtPos
values
ADP
, ADV
, CCONJ
, DET
, PRON
, SCONJ
(the fixed
relation should not be used for compounds
that work like content words). However, ExtPos
is occasionally useful in other situations, too:
for example, when a multiword expression acts as a proper noun (although its parts behave like other
words) or as an interjection.
ADJ
: adjective-like expression
Examples
- [sv] före detta (a multiword adjective paraphrasable as “former”, lit. “before that”; the first node is ADV)
ADP
: adposition-like expression
Multiword adpositions occur in many languages. Often they are grammaticalized prepositional phrases.
Examples
- [cs] na rozdíl ode mne “in contrast to me” (here the first node is the technical head and it is a preposition itself, so UPOS =
ExtPos
) - [cs] nehledě na jeho úspěchy “disregarding his achievements” (here the first node is a VERB)
ADV
: adverb-like expression
Examples
- [en] by and large (a multiword adverb paraphrasable as “altogether”; the first node is ADP)
CCONJ
: coordinating conjunction-like expression
Examples
- [fr] ainsi que “as well as” (ainsi = ADV)
DET
: determiner-like expression
Examples
- [fr] le volcan émet de la vapeur “the volcano emits steam” (de =
ADP
)
INTJ
: interjection-like expression
Examples
- [es] ¡Por Dios! “for God’s sake” (por =
ADP
, Dios = PROPN)
PRON
: pronoun-like expression
Examples
- [en] each other (each = DET)
PROPN
: proper noun-like expression
Examples
- [cs] Jeho kniha Most přes řeku Kwai byla zfilmována. “His book The Bridge over the River Kwai was made into a movie.” (Most = NOUN)
SCONJ
: subordinator-like expression
Examples
- [fr] bien que “although” (bien =
ADV
)
Foreign
: is this a foreign word?
Values: | Yes |
Boolean feature. Is this a foreign word? Not a loan word and not a foreign name but a genuinely foreign word appearing inside native text, e.g. inside direct speech, titles of books etc. This feature would apply either to the u-pos/X part of speech (unanalyzable token), or to other parts of speech if we know and are willing to annotate the class to which the word belongs in its original language.
See discussion at Foreign Expressions and Code-Switching.
Historical Note: This feature is new in UD version 2. It was used as a language-specific addition in several treebanks in version 1 but it was not considered boolean and three values were foreseen. Since the additional values were used extremely rarely, they are not part of the universal definition of this feature in UD v2.
Yes
: it is foreign
Example: [en] He said I could “dra åt helvete!“
Gender
: gender
Values: | Com | Fem | Masc | Neut |
Gender
is usually a lexical feature of nouns and inflectional feature
of other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with
nouns. In English gender affects only the choice of the personal
pronoun (he / she / it) and the feature is usually not encoded in
English tagsets.
See also the related feature of Animacy.
African languages have an analogous feature of noun classes: there might be separate grammatical categories for flat objects, long thin objects etc.
Masc
: masculine gender
Nouns denoting male persons are masculine. Other nouns may be also grammatically masculine, without any relation to sex.
Examples
- [cs] hrad “castle”
Fem
: feminine gender
Nouns denoting female persons are feminine. Other nouns may be also grammatically feminine, without any relation to sex.
Examples
- [de] Burg “castle”
Neut
: neuter gender
Some languages have only the masculine/feminine distinction while others also have this third gender for nouns that are neither masculine nor feminine (grammatically).
Examples
- [en] castle
- [cs] dítě “child”
- [sv] barn “child”
Com
: common gender
Some languages do not distinguish masculine/feminine most of the time but they do distinguish neuter vs. non-neuter (Swedish neutrum / utrum). The non-neuter is called common gender.
Note that it could also be expressed as a combined value
Gender=Fem,Masc
. Nevertheless we keep Com
also as a separate
value. Combined feature values should only be used in exceptional,
undecided cases, not for something that occurs systematically in the
grammar. Language-specific extensions to these guidelines should
determine whether the Com
value is appropriate for a particular
language.
Note further that the Com
value is not intended for cases where
we just cannot derive the gender from the word itself (without seeing the context),
while the language actually distinguishes Masc
and Fem
.
For example, in Spanish, nouns distinguish two genders, masculine and feminine, and
every noun can be classified as either Masc
or Fem
. Adjectives are supposed to
agree with nouns in gender (and number), which they typically achieve by alternating -o / -a.
But then there are adjectives such as grande or feliz that have only one form for both genders.
So we cannot tell whether they are masculine or feminine unless we see the context.
Yet they are either masculine or feminine (feminine in una ciudad grande, masculine in un puerto grande).
Therefore in Spanish we should not tag grande with Gender=Com
.
Instead, we should either drop the gender feature entirely
(suggesting that this word does not inflect for gender)
or tag individual instances of grande as either masculine or feminine, depending on context.
Examples
- [sv] väg “way”
Gender[dat]
: gender agreement with the dative argument
Gender[dat]
Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.
Gender[erg]
is the gender of the ergative argument of the verb.Gender[dat]
is the gender of the dative argument of the verb.
Masc
: masculine dative argument
Examples
- [eu] ukan ezak “have it”
Gender[erg]=Masc|Number[erg]=Sing|Person[erg]=2|Polite[erg]=Inf
|
Number[abs]=Sing|Person[abs]=3
(imperative addressing a man)
Fem
: feminine dative argument
Examples
- [eu] ukan ezan “have it”
Gender[erg]=Fem|Number[erg]=Sing|Person[erg]=2|Polite[erg]=Inf
|
Number[abs]=Sing|Person[abs]=3
(imperative addressing a woman)
Gender[erg]
: gender agreement with the ergative argument
Gender[erg]
Finite verbs in many Indo-European languages agree in person and number with their subject. Some languages in other families are head-marking, which means that the verbal morphology can cross-reference multiple core arguments.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.
Gender[erg]
is the gender of the ergative argument of the verb.Gender[dat]
is the gender of the dative argument of the verb.
Masc
: masculine ergative argument
Examples
- [eu] ukan ezak “have it”
Gender[erg]=Masc|Number[erg]=Sing|Person[erg]=2|Polite[erg]=Inf
|
Number[abs]=Sing|Person[abs]=3
(imperative addressing a man)
Fem
: feminine dative argument
Examples
- [eu] ukan ezan “have it”
Gender[erg]=Fem|Number[erg]=Sing|Person[erg]=2|Polite[erg]=Inf
|
Number[abs]=Sing|Person[abs]=3
(imperative addressing a woman)
Gender[obj]
: gender agreement with object
Gender[obj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Gender
of the argument,
we have two layers of Gender
on the verb: Gender[subj]
, and (for transitive verbs) Gender[obj]
.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.
Gender[erg]
is the gender of the ergative argument of the verb. The corresponding feature in Interset 2.041 is callederggender
.Gender[dat]
is the gender of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatgender
.
Masc
: masculine object
Examples: [eu] ukan ezak “have it” Gender[erg]=Masc|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf
(imperative addressing a man)
Fem
: feminine object
Examples: [eu] ukan ezan “have it” Gender[erg]=Fem|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf
(imperative addressing a woman)
Gender[psor]
: possessor’s gender
Values: | Fem | Masc | Neut |
Possessive
adjectives and pronouns may have two different genders: that of the
possessed object (gender agreement with modified noun) and that of
the possessor (lexical feature, inherent gender). The Gender[psor]
feature captures the possessor’s gender.
In
the Czech examples below, the masculine Gender[psor]
implies using one
of the suffixes -ův, -ova, -ovo,
and the feminine Gender[psor]
implies using one of -in,
-ina, -ino.
Masc
: masculine possessor
Examples
- [cs] otcův syn (father’s son; PossGender=Masc|Gender=Masc); otcova dcera (father’s daughter; PossGender=Masc|Gender=Fem); otcovo dítě (father’s child; PossGender=Masc|Gender=Neut).
Fem
: feminine possessor
Examples
- [cs] matčin syn (mother’s son; PossGender=Fem|Gender=Masc); matčina dcera (mother’s daughter; PossGender=Fem|Gender=Fem); matčino dítě (mother’s child; PossGender=Fem|Gender=Neut).
Neut
: neuter possessor
Examples
- [cs] Dítě plakalo, protože někdo odnesl jeho hračku. “The child wept because somebody took away its (=the child’s) toy.”
Gender[subj]
: gender agreement with subject
Gender[subj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Gender
of the argument,
we have two layers of Gender
on the verb: Gender[subj]
, and (for transitive verbs) Gender[obj]
.
While it would be possible to make the subject layer the default and use just Gender
for it,
the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
In the informal register, there are also separate forms for masculine and feminine arguments, although gender is otherwise not distinguished in Basque.
Gender[erg]
is the gender of the ergative argument of the verb. The corresponding feature in Interset 2.041 is callederggender
.Gender[dat]
is the gender of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatgender
.
Masc
: masculine subject
Examples: [eu] ukan ezak “have it” Gender[erg]=Masc|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf
(imperative addressing a man)
Fem
: feminine subject
Examples: [eu] ukan ezan “have it” Gender[erg]=Fem|Number[abs]=Sing|Number[erg]=Sing|Person[abs]=3|Person[erg]=2|Polite[erg]=Inf
(imperative addressing a woman)
Hyph
: hyphenated compound or part of it
Boolean feature. Is this part of a hyphenated compound? Depending on tokenization, the compound may be one token or be split to several tokens; then the tokens need tags.
These are words corresponding to prefixes such inter-
(inter disciplinary), post-
(post traumatic), un-
(un avoidable), di-
(di transitive) and so on in English, but which are
relized as distinct tokens (without the hyphen) in different languages.
Yes
: it is part of hyphenated compound
Note that this depends on the tokenization conventions used in the language.
For example, in Czech (see below), česko-slovenský is tokenized as three
tokens: česko, the hyphen, and slovenský. While slovenský is a normal
adjective in Czech, česko is derived from an adjectival stem but it is in
a form that can never occur as a separate word. On the other hand, it can be
combined with many other adjectives denoting affiliation with a country or
region: česko-moravský, česko-německý, česko-americký etc. If tokenization
left it as one token, it the whole word česko-slovenský would be simply an
adjective and no Hyph=Yes
would be used in the annotation.
Examples
- [cs] česko-slovenský “Czecho-Slovak”
- [en] Anglo-Saxon
Mood
: mood
Values: | Adm | Cnd | Des | Imp | Ind | Int | Irr | Jus | Nec | Opt | Pot | Prp | Qot | Sub |
Mood is a feature that expresses modality and subclassifies finite verb forms.
Ind
: indicative or realis
The indicative can be considered the default mood. A verb in indicative merely states that something happens, has happened or will happen, without adding any attitude of the speaker.
Examples
- [cs] Studuješ na univerzitě. “You study at the university.”
- [de] Du studierst an der Universität. “You study at the university.”
- [fr] Tu le fais. “You do it.”
- [tr] eve gidiyor “she is going home”
- [tr] eve gitti “she went home”
- [et] Sa ei tule. “You are not coming.”
- [pt] Ela foi para casa. “she went home.”
- [sq] Ti flet shqip. “You speak Albanian.”
Imp
: imperative
The speaker uses imperative to order or ask the addressee to do the action of the verb.
Examples
- [cs] Studuj na univerzitě! “Study at the university!”
- [de] Studiere an der Universität! “Study at the university!”
- [tr] eve git “go home!”
- [tr] eve gidin “go home!” (plural)
- [tr] eve gitsin “[let him] go home!” (3rd person imperative)
- [sa] ब्रूहि राजः / brūhi rājaḥ “tell the king”
Cnd
: conditional
The conditional mood is used to express actions that would have taken place under some circumstances but they actually did not / do not happen. Grammars of some languages may classify conditional as tense (rather than mood) but e.g. in Czech it combines with two different tenses (past and present).
Examples
- [cs] Kdybych byl chytrý, studoval bych na univerzitě. “If I were smart I would study at the university” (note that only the auxiliary bych is specific to conditional; the active participle byl is also needed to analytically form the conditional mood, however, it will only be tagged as participle because it can also be used to form past tense indicative.)
- [tr] eve gittiyse “if she went home”
- [tr] eve gidiyorsa “if she is going home”
- [tr] eve giderse “if she goes home”
- [tr] eve gidecekdiyse “if she was going to go home”
Pot
: potential
The action of the verb is possible but not certain. This mood corresponds to the modal verbs can, might, be able to. Used e.g. in Finnish. See also the optative.
Examples
- [tr] eve gidebilir “she can go home”
- [tr] eve gidemeyebilir “she may not be able to go home”
Sub
: subjunctive / conjunctive
The subjunctive mood is used under certain circumstances in subordinate clauses, typically for actions that are subjective or otherwise uncertain. In German, it may be also used to convey the conditional meaning.
Examples
- [fr] Je veux que tu le fasses “I want you to do it” lit. I want that you it do.Sub
Jus
: jussive / injunctive
The jussive mood expresses the desire that the action happens; it is thus close to both imperative and optative.
Unlike in desiderative, it is the speaker, not the subject who wishes that it happens.
Used e.g. in Arabic. We also map the Sanskrit injunctive to Mood=Jus
.
Examples
- [sa] मैवं वोचः / maivaṁ vocaḥ “Do not speak this way”
Prp
: purposive
Means “in order to”, occurs in Amazonian and Australian languages, such as Arabana.
Examples
- [ard] Antha yukarnda puntyi manilhiku. “I am going to get some meat.”
Qot
: quotative
The quotative mood is used e.g. in Estonian to denote direct speech. The boundary between this mood and the non-first-hand Evidentiality is blurred.
Examples
- [et] Sa ei tulevat. “You are reportedly not coming.”
Opt
: optative
Expresses exclamations like “May you have a long life!” or “If only I were rich!” In Turkish it also expresses suggestions. In Sanskrit it may express possibility (cf. the potential mood in other languages).
Examples
- [tr] eve gidelim ‘let’s go home’
- [sa] अप्रधानः प्रधानः स्यात् / apradhānaḥ pradhānaḥ syāt “the unimportant person may be (become) important”
Des
: desiderative
The desiderative mood corresponds to the modal verb “want to”: “He wants to come.” Used e.g. in Japanese or Turkish.
Examples
- [ja] 食べたい / tabetai “want to eat”
Nec
: necessitative
The necessitative mood expresses necessity and corresponds to the modal verbs “must, should, have to”: “He must come.”
Examples
- [tr] eve gitmeli “she should go home”
- [tr] eve gitmeliydi “she should have gone home”
Int
: interrogative
Verbs in some languages have a special interrogative form that is used in yes-no questions. This is attested, for instance, in the Turkic languages. Celtic languages have it for the copula but not for normal verbs.
Examples
- [ug] يېدىڭىزمۇ؟ / yëdingizmu? “Have you eaten?”
- [ga] Nach in aghaidh easa atá sé ag snámh? “Isn’t he swimming against the tide / fighting a losing battle?”
- [gd] Rudeigin mu dheidhinn sgrìob a Venezuela, an e? “Something about a trip to Venezuela, isn’t it?”
Irr
: irrealis
The irrealis mood denotes an action that is not known to have happened. As such, it is a roof term
for a group of more specific moods such as conditional, potential, or desiderative. Some languages
do not distinguish these finer shades of meaning but they do distinguish realis (which we tag with
the same feature as indicative, Ind
) and irrealis.
Examples
- [quc] Xaq ta ne kimbʼe iwukʼ. “Let me be with you.” (“Que fuera yo con ustedes.”)
Adm
: admirative
Expresses surprise, irony or doubt. Occurs in Albanian, other Balkan languages, and in Caddo (Native American from Oklahoma).
Examples
- [sq] Ti fliske shqip! “You (surprisingly) speak Albanian!”
NameType
: type of named entity
Values: | Com | Geo | Giv | Nat | Oth | Pat | Pro | Prs | Sur |
Classification of named entities (token-based, no nesting of entities etc.)
The feature applies mainly to the PROPN tag;
in multi-word foreign names, adjectives may also have this feature
(they preserve the ADJ
tag but at the same time they would not exist in the
host language otherwise than in the named entity).
Geo
: geographical name
Names of cities, countries, rivers, mountains etc.
Examples
- [cs] Praha “Prague”, Kostelec nad Černými lesy , Německo “Germany”
Prs
: name of person
This value is used if it is not known whether it is a given or a family name, but it is known that it is a personal name.
Examples
- [sms] Ja seeʹst vueʹppes leäi Laurikainen “And they had a guide Laurikainen”
Giv
: given name of person
Given name (not family name). This is usually the first name in European and American names. In Chinese names, the last two syllables (of three) are usually the given name.
Examples
- [en] George Bush
Pat
: patronymic in a name of a person
Patronymic (not given name and not family name). This is the middle name in East Slavic personal names.
Examples
- [uk] директора Бульби Олександра Миколайовича / dyrektora Buľby Oleksandra Mykolajovyča
Sur
: surname / family name of person
Family name (surname). This is usually the last name in European and American names. In Chinese names, the first syllable (of three) is usually the surname.
Examples
- [en] George Bush
Nat
: nationality
Name denoting a member of a particular nation, or inhabitant of a particular territory.
Examples
- [cs] Čech “Czech”, Němec “German”, Pražan “Praguer”
- [cs] Po válce byli Němci z Československa vyhnáni. “After the war, the Germans were expelled from Czechoslovakia.”
Com
: company, organization
Examples
- [en] Microsoft, UNESCO
Pro
: product
Examples
- [en] Opel Vectra
Oth
: other
Names of stadiums, guerilla bases, events etc.
Examples
- [en] the COLING 2020 conference
NounClass
: noun class
Values: | Bantu1 | Bantu2 | Bantu3 | Bantu4 | Bantu5 | Bantu6 | Bantu7 | Bantu8 | Bantu9 | Bantu10 |
Bantu11 | Bantu12 | Bantu13 | Bantu14 | Bantu15 | Bantu16 | Bantu17 | Bantu18 | Bantu19 | Bantu20 | |
Bantu21 | Bantu22 | Bantu23 | ||||||||
Wol1 | Wol2 | Wol3 | Wol4 | Wol5 | Wol6 | Wol7 | Wol8 | Wol9 | Wol10 | |
Wol11 | Wol12 |
NounClass
is similar to Gender and Animacy because it is to a large part
a lexical category of nouns and other parts of speech inflect for it
to show agreement (pronouns, adjectives,
determiners, numerals, verbs).
The distinction between gender and noun class is not sharp and is partially conditioned by the traditional terminology of a given language family. In general, the feature is called gender if the number of possible values is relatively low (typically 2-4) and the partition correlates with sex of people and animals. In language families where the number of categories is high (10-20), the feature is usually called noun class. No language family uses both the features.
In Bantu languages, the noun class also encodes Number; therefore it is
a lexical-inflectional feature of nouns. The words should be annotated with
the Number
feature in addition to NounClass
, despite the fact that people
who know Bantu could infer the number from the noun class. The lemma of the
noun should be its singular form.
The set of values of this feature is specific for a language family or group.
Within the group, it is possible to identify classes that have similar meaning
across languages (although some classes may have merged or disappeared in
some languages in the group). The value of the NounClass
feature consists
of a short identifier of the language group (e.g., Bantu
), and the number
of the class (there is a standardized class numbering system accepted by
scholars of the various Bantu languages; similar numbering systems should be
created for the other families that have noun classes).
List of noun classes in Swahili
(from https://en.wikipedia.org/wiki/Noun_class)
Class number | Prefix | Typical meaning |
---|---|---|
1 | m-, mw-, mu- | singular: persons |
2 | wa-, w- | plural: persons (a plural counterpart of class 1) |
3 | m-, mw-, mu- | singular: plants |
4 | mi-, my- | plural: plants (a plural counterpart of class 3) |
5 | ji-, j-, Ø- | singular: fruits |
6 | ma-, m- | plural: fruits (a plural counterpart of class 5, 9, 11, seldom 1) |
7 | ki-, ch- | singular: things |
8 | vi-, vy- | plural: things (a plural counterpart of class 7) |
9 | n-, ny-, m-, Ø- | singular: animals, things |
10 | n-, ny-, m-, Ø- | plural: animals, things (a plural counterpart of class 9 and 11) |
11 | u-, w-, uw- | singular: no clear semantics |
15 | ku-, kw- | verbal nouns |
16 | pa- | locative meanings: close to something |
17 | ku- | indefinite locative or directive meaning |
18 | mu-, m- | locative meanings: inside something |
Bantu1
: singular, persons
The corresponding plural class is Bantu2
.
Examples
- [sw] mtoto “child”
Bantu2
: plural, persons
The corresponding singular class is Bantu1
.
Examples
- [sw] watoto “children”
Bantu3
: singular, plants, thin objects
The corresponding plural class is Bantu4
.
Examples
- [sw] mti “tree”
Bantu4
: plural, plants, thin objects
The corresponding singular class is Bantu3
.
Examples
- [sw] miti “trees”
Bantu5
: singular, fruits, round objects, paired things
The corresponding plural class is Bantu6
.
Examples
- [sw] jiwe “stone”
Bantu6
: plural, fruits, round objects, paired things
The corresponding singular class is Bantu5
, also Bantu9
, Bantu11
, seldomly Bantu1
.
Examples
- [sw] mawe “stones”
Bantu7
: singular, things, diminutives
The corresponding plural class is Bantu8
.
Examples
- [sw] kitabu “book”
Bantu8
: plural, things, diminutives
The corresponding singular class is Bantu7
.
Examples
- [sw] vitabu “books”
Bantu9
: singular, animals, things
The corresponding plural class is Bantu10
or Bantu6
.
Examples
- [sw] ndege “bird”
Bantu10
: plural, animals, things
The corresponding singular class is Bantu9
.
Examples
- [sw] ndege “birds” (plural of the noun is identical to singular; however, verbs agree with the zi- prefix in plural and with i- in singular)
Bantu11
: long thin objects, natural phenomena, abstracts
Examples
- [sw] utoto “childhood”
Bantu12
: singular, small things, diminutives
The corresponding plural class is Bantu13
or Bantu14
.
Examples
- [lg] embwa “dog” → akabwa “puppy”
Bantu13
: plural or mass, small amount of mass
Examples
- [lg] mazzi “water” → otuzzi “drop of water”
Bantu14
: plural, diminutives
In Ganda, this is the plural counterpart of Bantu12
.
Examples
- [lg] obubwa “puppies”
Bantu15
: verbal nouns, infinitives
Examples
- [sw] -soma “read” → kusoma “reading; to read”
Bantu16
: definite location, close to something
Examples
- [sw] pahali “place”
Bantu17
: indefinite location, direction, movement
Examples
- [sw] kule “there”
Bantu18
: definite location, inside something
Examples
- [sw] mule “in there”
Bantu19
: little bit of, pejorative plural
Bantu class 19 may signify “a little bit of” or a plural with a pejorative nuance, as in Hunde.
Examples
- [hke] hyùndù “a bit of porridge”
- [hke] hìkátsì “frail females”
- [hke] hyábánà “thin children”
Bantu20
: singular, augmentatives
In Ganda, the corresponding plural class is Bantu6
or Bantu22
.
Examples
- [lg] musajja “man” → ogusajja “giant”
Bantu21
: singular, augmentatives, derogatives
Examples
- [ve] ḓinga “large lump of earth”
- [ve] ḓanḓa “big clumsy hand”
Bantu22
: plural, augmentatives
The corresponding singular class is Bantu20
.
Examples
- [lg] agasajja “giants”
Bantu23
: location with place names
Examples
- [lg] elugala “at Lugala”
Noun Classes in Wolof
Wolof is a non-Bantu Niger-Congo language. It has noun classes but their semantics cannot be easily mapped on the Bantu classes. The class is morphologically unmarked on nouns (although it is an inherent property of the lexeme) but determiners have to show agreement with the class.
The Wolof noun class system lacks semantic coherence. One reason for this is that in Wolof noun classification is sometimes based on other factors than semantics, including phonology and morphology. And still these are just some tendencies, but in most cases there is no clear semantics, phonology or morphology that can explain the classification in Wolof.
Examples
The following table shows the forms of proximate demonstratives in the first ten noun classes; classes 2 and 8 are plural, the rest are singular.
Wol1 | Wol2 | Wol3 | Wol4 | Wol5 | Wol6 | Wol7 | Wol8 | Wol9 | Wol10 | English |
ki | gi | ji | bi | mi | li | si | wi | “this” | ||
ñi | yi | “these” |
Wolof classes 11 and 12, although behaving like noun classes, have meanings that are adverbial rather than nominal: class 11 is for location, class 12 for manner.
Wol11 | Wol12 |
fi “here” | ni “so” |
Wol1
: Wolof noun class 1/k (singular human)
Examples
- [wo] nitug Afrig ki
Wol2
: Wolof noun class 2/ñ (plural human)
Examples
- [wo] jigéen ñi
Wol3
: Wolof noun class 3/g (singular)
Examples
- [wo] dexug Gaambi gi
Wol4
: Wolof noun class 4/j (singular)
Examples
- [wo] jenn jamono ji
Wol5
: Wolof noun class 5/b (singular)
For example, “dog” is in the b class.
Examples
- [wo] xaj bi “this dog” (dog class-DEF.PROX)
- [wo] xaj ba “that dog” (dog class-DEF.REMT)
- [wo] buur bi
Wol6
: Wolof noun class 6/m (singular)
For example, “sheep” is in the m class.
Examples
- [wo] xar mi “this sheep” (sheep class-DEF.PROX)
- [wo] at mi
Wol7
: Wolof noun class 7/l (singular)
Examples
- [wo] ndongo li
Wol8
: Wolof noun class 8/y (plural non-human)
Examples
- [wo] nguur yii
Wol9
: Wolof noun class 9/s (singular)
Examples
- [wo] sàmm si
Wol10
: Wolof noun class 10/w (singular)
Examples
- [wo] sama nag wa
Wol11
: Wolof noun class 11/f (location)
Examples
- [wo] fi “here”
- [wo] fa “there”
Wol12
: Wolof noun class 12/n (manner)
Examples
- [wo] ni “so”
NounType
: noun type
Values: | Clf |
We already split common and proper nouns at the level of UPOS tags but some tagsets mark other distinctions.
Clf
: classifier
Chinese classifiers between cardinal numbers and nouns, or between determiners and nouns.
Examples
- [zh] 三項工程 / sān xiàng gōngchéng “three projects”
NumForm
: numeral form
Values: | Combi | Digit | Roman | Word |
Feature of cardinal and ordinal numbers. Is the number expressed by digits or as a word? This feature appears in a number of tagsets. Note that it is currently a bit Euro-centric because it distinguishes (Euro)Arabic digits and Roman numerals, but what about digits in various other scripts? In texts in many Indian scripts and in the Arabic script both native digits and Euro-Arabic digits can appear (e.g. 2014 vs. २०१४ in Devanagari).
Word
: number expressed as word
Examples: one, two, three
Digit
: number expressed using digits
Examples: 1, 2, 3
Combi
: digits combined with a suffix
Examples: [lt] 15-oji (15th)
Roman
: roman numeral
Examples: I, II, III
NumType
: numeral type
Values: | Card | Dist | Frac | Mult | Ord | Range | Sets |
Some languages (especially Slavic) have a complex system of numerals. For example, in the school grammar of Czech, the main part of speech is “numeral”, it includes almost everything where counting is involved and there are various subtypes. It also includes interrogative, relative, indefinite and demonstrative words referring to numbers (words like kolik / how many, tolik / so many, několik / some, a few), so at the same time we may have a non-empty value of PronType. (In English, these words are called quantifiers and they are considered a subgroup of determiners.)
From the syntactic point of view, some numtypes behave like adjectives
and some behave like adverbs. We tag them u-pos/ADJ and
u-pos/ADV respectively. Thus the NumType
feature applies to
several different parts of speech:
- u-pos/NUM: cardinal numerals
- u-pos/DET: quantifiers
- u-pos/ADJ: definite adjectival, e.g. ordinal numerals
- u-pos/ADV: adverbial (e.g. ordinal and multiplicative) numerals, both definite and pronominal
Card
: cardinal number or corresponding interrogative / relative / indefinite / demonstrative word
Note that in some Indo-European languages there is a fuzzy borderline between numerals and nouns for thousand, million and billion.
Examples
- [en] one, two, three
- [cs] jeden, dva, tři “one, two, three”; kolik “how many”; několik “some”; tolik “so many”; mnoho “many”; málo “few”
- [cs] čtvero, patero, desatero (specific forms of four, five, ten;
they are morphologically, syntactically and stylistically distinct from the
default forms čtyři, pět, deset; in Czech grammar they are classified
as “generic numerals”, which also encompasses some other rare types;
nevertheless,
Card
is the closest match for them among the universal types.
Ord
: ordinal number or corresponding interrogative / relative / indefinite / demonstrative word
This is a subtype of adjective or (in some languages) of adverb.
Examples
- [en] first, second, third;
- [cs] adjectival: první “first”; druhý “second”, třetí “third”; kolikátý lit. how manieth “which rank”; několikátý “some rank”; tolikátý “this/that rank”
- [cs] adverbial: poprvé “for the first time”; podruhé “for the second time”; potřetí “for the third time”; pokolikáté “for which time”, poněkolikáté “for x-th time”, potolikáté
Mult
: multiplicative numeral or corresponding interrogative / relative / indefinite / demonstrative word
This is subtype of adjective or adverb.
Examples
- [sl] dvojen “double, twofold”; trojen “triple, threefold”; četveren “fourfold”
- [cs] dvojí “twofold”; trojí “threefold” (multiplicative adjectives)
- [cs] jednou “once”; dvakrát “twice”; třikrát “three times”; kolikrát “how many times”, několikrát “several times”; tolikrát “so many times” (multiplicative adverbs)
Frac
: fraction
This is a subtype of cardinal numbers, occasionally distinguished in corpora. It may denote a fraction or just the denominator of the fraction. In various languages these words may behave morphologically and syntactically as nouns or ordinal numerals.
Examples
- [en] three-quarters
- [cs] půl / polovina “half”; třetina “one third”; čtvrt / čtvrtina “quarter”
Sets
: number of sets of things; collective numeral
Morphologically distinct class of numerals used to count sets of things, or nouns that are pluralia tantum. Some authors call this type collective numeral.
Examples
- [cs] dvoje / troje boty “two / three [pairs of] shoes”; as opposed to normal cardinal numbers: dvě / tři boty “two / three shoes”
Dist
: distributive numeral
Used to express that the same quantity is distributed to each member in a set of targets.
Examples
- [hu] három-három in gyermekenként három-három ezer forinttal “three thousand forint per child”
Range
: range of values
This could be considered a subtype of cardinal numbers, occasionally distinguished in corpora.
Examples
- [en] two-five “two to five” (provided tokenization leaves it as one token.)
Number
: number
Values: | Coll | Count | Dual | Grpa | Grpl | Inv | Pauc | Plur | Ptan | Sing | Tri |
Number
is usually an inflectional feature of nouns and,
depending on language, other parts of speech (pronouns,
adjectives, determiners, numerals,
verbs) that mark agreement with nouns.
In languages where noun phrases are pluralized using a specific function
word (pluralizer), this function word is tagged DET and Number=Plur
is its lexical feature.
Sing
: singular number
A singular noun denotes one person, animal or thing.
Examples
- [en] car
Plur
: plural number
A plural noun denotes several persons, animals or things.
Examples
- [en] cars
- [yo] àwọn àgùntàn “the sheep (plural)”
- [tl] mga guro “teachers”
Dual
: dual number
A dual noun denotes two persons, animals or things.
Examples
- [sl] singular glas “voice”, dual glasova “voices”, plural glasovi “voices”
- [ar] singular سَنَةٌ sanatun “year”, dual سَنَتَانِ sanatāni “years”, plural سِنُونَ sinūna “years”.
Tri
: trial number
A trial pronoun denotes three persons, animals or things. It occurs in pronouns of several Austronesian languages, such as Biak.
Examples
- [bhw] sko “they three”
- [bhw] singular ibiser “he is hungry”, dual subiser “they two are hungry”, trial skobiser “they three are hungry”, plural sibiser “they are hungry”
Pauc
: paucal number
A paucal noun denotes “a few” persons, animals or things.
Examples
- [wbp] singular karli “boomerang”, paucal karlipatu “a few boomerangs”
Grpa
: greater paucal number
A greater paucal noun denotes “more than several but not many” persons, animals or things. It occurs in Sursurunga, an Austronesian language.
Examples
- [sgz] singular iau “I”, dual giur “the two of us”, paucal gimtul “the few of us”, greater paucal gimhat “we”, plural gim “we”
Grpl
: greater plural number
A greater plural noun denotes “many, all possible” persons, animals or things. Precise semantics varies across languages.
Examples
- [ff] singular ngesa “field”, plural gese “fields”, greater plural geseeli “many fields”
Inv
: inverse number
Inverse number means non-default for that particular noun. (Some nouns are by default assumed to be singular, some dual or plural.) Occurs e.g. in Kiowa.
Examples
- [kio] ę́:dè sân khópdɔ́: “This child is sick.” (basic, singular)
- [kio] ę́:dè sân ę̀khópdɔ́: “These two children are sick.” (basic, dual)
- [kio] ę́:gɔ̀ są̂:dɔ̀ èkhópdɔ́: “These children are sick.” (inverse, plural)
Count
: count plural
A special plural form of nouns (and other parts of speech, such as adjectives) if they occur after numerals.
In Bulgarian and Macedonian, this form is known variously as “counting form”,
“count plural” or “quantitative plural” (Sussex and Cubberley 2006, p. 324).
(The form originates in the Proto-Slavic dual but it should not be marked
Number=Dual
because 1. the dual vanished from Bulgarian and 2. the form is
no longer semantically tied to the number two.)
Other languages (e.g., Russian) have forms that are not necessarily related to dual, yet they are used exclusively with numerals.
Examples
- [bg] три стола / tri stola “three chairs” vs. столове / stolove “chairs”
- [ru] шага́, шара́, ряда́ / šagá, šará, rjadá “steps, balls, rows”
Ptan
: plurale tantum
Some nouns appear only in the plural form even though they denote one
thing (semantic singular); some tagsets mark this distinction.
Grammatically they behave like plurals, so Plur
is obviously the
back-off value here; however, if the language also marks gender, the
non-existence of singular form sometimes means that the gender is
unknown. In Czech, special type of numerals is used when counting
nouns that are plurale tantum (NumType = Sets).
Examples
- [en] scissors, pants
- [cs] nůžky, kalhoty
Coll
: collective / mass / singulare tantum
Collective or mass or singulare tantum is a special case of singular. It applies to words that use grammatical singular to describe sets of objects, i.e. semantic plural. Although in theory they might be able to form plural, in practice it would be rarely semantically plausible. Sometimes, the plural form exists and means “several sorts of” or “several packages of”.
Examples
- [cs] lidstvo “mankind”
References
- Sussex, Roland and Cubberley, Paul. 2006. The Slavic Languages. Cambridge University Press.
Number[abs]
: number agreement with absolutive argument
Number[abs]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Number[abs]
is the number of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsnumber
.Number[erg]
is the number of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergnumber
.Number[dat]
is the number of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatnumber
.
One may want to use just Number
instead of Number[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection.
Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time.
Examples: dena (Number=Sing|Number[abs]=Sing
),
dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing
),
dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur
),
direnak (Number=Plur|Number[abs]=Plur
).
So we reserve the Number
feature for nominal inflection, and the Number[abs]
feature for agreement.
Note that we also define Person[abs]
and Polite[abs]
, although there is no direct conflict for these features.
But it is better to have these features aligned with Person[erg]
, Polite[erg]
, Person[dat]
and Polite[dat]
.
Sing
: singular absolutive argument
Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing
Plur
: plural absolutive argument
Examples: [eu] dakarkiogu Number[erg]=Plur
Number[dat]
: number agreement with dative argument
Number[dat]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Number[abs]
is the number of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsnumber
.Number[erg]
is the number of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergnumber
.Number[dat]
is the number of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatnumber
.
One may want to use just Number
instead of Number[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection.
Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time.
Examples: dena (Number=Sing|Number[abs]=Sing
),
dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing
),
dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur
),
direnak (Number=Plur|Number[abs]=Plur
).
So we reserve the Number
feature for nominal inflection, and the Number[abs]
feature for agreement.
Note that we also define Person[abs]
and Polite[abs]
, although there is no direct conflict for these features.
But it is better to have these features aligned with Person[erg]
, Polite[erg]
, Person[dat]
and Polite[dat]
.
Sing
: singular dative argument
Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing
Plur
: plural dative argument
Examples: [eu] dakarkiogu Number[erg]=Plur
Number[erg]
: number agreement with ergative argument
Number[erg]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Number[abs]
is the number of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsnumber
.Number[erg]
is the number of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergnumber
.Number[dat]
is the number of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatnumber
.
One may want to use just Number
instead of Number[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection.
Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time.
Examples: dena (Number=Sing|Number[abs]=Sing
),
dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing
),
dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur
),
direnak (Number=Plur|Number[abs]=Plur
).
So we reserve the Number
feature for nominal inflection, and the Number[abs]
feature for agreement.
Note that we also define Person[abs]
and Polite[abs]
, although there is no direct conflict for these features.
But it is better to have these features aligned with Person[erg]
, Polite[erg]
, Person[dat]
and Polite[dat]
.
Sing
: singular ergative argument
Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing
Plur
: plural ergative argument
Examples: [eu] dakarkiogu Number[erg]=Plur
Number[obj]
: number agreement with object
Number[obj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Number
of the argument,
we have two layers of Number
on the verb: Number[subj]
, and (for transitive verbs) Number[obj]
.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Number[abs]
is the number of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsnumber
.Number[erg]
is the number of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergnumber
.Number[dat]
is the number of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatnumber
.
One may want to use just Number
instead of Number[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection.
Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time.
Examples: dena (Number=Sing|Number[abs]=Sing
),
dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing
),
dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur
),
direnak (Number=Plur|Number[abs]=Plur
).
So we reserve the Number
feature for nominal inflection, and the Number[abs]
feature for agreement.
Note that we also define Person[abs]
and Polite[abs]
, although there is no direct conflict for these features.
But it is better to have these features aligned with Person[erg]
, Polite[erg]
, Person[dat]
and Polite[dat]
.
Sing
: singular object
Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing
Dual
: dual object
Examples: [wbp] Nyanyi karnapalangu wawirrijarra. lit. see-NONPAST PRES-1SG(SUBJ)-3DU(OBJ) kangaroo-DU(ABS) “I see two kangaroos.”
Plur
: plural object
Examples: [eu] dakarkiogu Number[erg]=Plur
Number[psed]
: possessed object’s number
Number[psed]
Number[psed]
is the possessee’s (possessed, owned noun phrase’s) number. In
Hungarian, possession can be marked on the possessor or on the
possessed. It is possible, though rare, that a noun has three
distinct number features: its own grammatical number, number of its
possessor and number of its possession. Examples from the
Multext-East Hungarian lexicon:
- könnyedén (SSS)
- könny = a tear (singular)
- könnyed = your tear (singular owner)
- könnyedé = (possession) of your tear (singular possession)
- könnyedén = (on the possession) of your tear (superessive case)
- ellenfeleié (PSS)
- ellenfél = an opponent (singular)
- ellenfele = his/her/its opponent (singular owner)
- ellenfelei = his/her/its opponents (core plural, singular owner)
- ellenfeleié = (possession) of his/her/its opponents (singular possession)
- életeké (SPS)
- él = point (singular)
- élek = points (plural)
- élén = his/her/its point (singular owner)
- élünk = our point (plural owner)
- életeké = (possession) of our point (singular possession)
- tárgyalópartnereinkét (PPS)
- tárgyalópartner = negotiator (singular)
- tárgyalópartnerei = his/her/its negotiators (plural, singular owner)
- tárgyalópartnereinkét = (possession) of our negotiators (plural, plural owner, singular possession, accusative case)
Words marked for plural possessions are very rare, though. Note that in the following example from Multext-East, Columbus is marked for plural possession, but not for his own owner.
- Kolumbuszéinál
- Kolumbusz = Columbus (singular)
- Kolumbuszéi = (possessions) of Columbus (plural possession)
- Kolumbuszéinál = (at the possessions) of Columbus (adessive case)
Sing
: singular possession
Examples
- [hu] ellenfeleié “(possession) of his/her/its opponents” (singular possession)
Plur
: plural possession
Examples
- [hu] Kolumbuszéi “(possessions) of Columbus” (plural possession)
Number[psor]
: possessor’s number
Possessives may have two different numbers: that of the possessed object (number agreement with
modified noun) and that of the possessor. The Number[psor]
feature captures the possessor’s number.
Sing
: singular possessor
Examples
- [en] my, his, her, its
- [cs] můj pes “my dog”
Number[psor]=Sing|Number=Sing
- [cs] mí psi “my dogs”
Number[psor]=Sing|Number=Plur
- [hsb] twojim, jeho, jeje
Dual
: dual possessor
Examples
- [hsb] jeju
Plur
: plural possessor
Examples
- [en] _our, their
- [cs] náš pes “our dog”
Number[psor]=Plur|Number=Sing
- [cs] naši psi “our dogs”
Number[psor]=Plur|Number=Plur
- [hsb] naš, waš, jich
Number[subj]
: number agreement with subject
Number[subj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Number
of the argument,
we have two layers of Number
on the verb: Number[subj]
, and (for transitive verbs) Number[obj]
.
While it would be possible to make the subject layer the default and use just Number
for it,
the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Number[abs]
is the number of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsnumber
.Number[erg]
is the number of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergnumber
.Number[dat]
is the number of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatnumber
.
One may want to use just Number
instead of Number[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, and more important, some Basque finite verbs have additional morphemes of nominal inflection.
Thus their form reflects the person-number agreement with the absolutive argument (nor), and nominal inflection (case, number etc.) at the same time.
Examples: dena (Number=Sing|Number[abs]=Sing
),
dituena (Number=Sing|Number[abs]=Plur|Number[erg]=Sing
),
dugunak (Number=Plur|Number[abs]=Sing|Number[erg]=Plur
),
direnak (Number=Plur|Number[abs]=Plur
).
So we reserve the Number
feature for nominal inflection, and the Number[abs]
feature for agreement.
Note that we also define Person[abs]
and Polite[abs]
, although there is no direct conflict for these features.
But it is better to have these features aligned with Person[erg]
, Polite[erg]
, Person[dat]
and Polite[dat]
.
Sing
: singular subject
Examples: [eu] dakarkiogu Number[abs]=Sing|Number[dat]=Sing
Plur
: plural subject
Examples: [eu] dakarkiogu Number[erg]=Plur
PartType
: particle type
Values: | Emp | Inf | Int | Mod | Neg | Vbp |
Types of particles are found in various tagsets and are highly language-specific. The list here is not exhaustive. Language-specific documentation should provide a version of this page tailored to the given language.
Mod
: modal particle
Examples: [bg] май (possibly), нека (let), [cs] ať, kéž, nechť (let)
Emp
: particle of emphasis
Examples: [bg] даже (even)
Inf
: infinitive marker
Examples: [en] to, [de] zu, [da] at, [sv] att
Int
: question particle
Required in some languages to form a yes-no question.
Examples: [pl] czy
Neg
: negation particle
Negates a clause or a smaller phrase.
Examples: [en] not, [de] nicht
Vbp
: separated verb prefix in German
They are analogous to verbal particles in other Germanic languages, which again overlap with adpositions and adverbs. Do we want to tag them as adpositions/adverbs and add this feature?
Examples: [de] vor (in stellen Sie sich vor)
Person
: person
Values: | 0 | 1 | 2 | 3 | 4 |
Person is typically feature of personal and possessive pronouns / determiners, and of verbs. On verbs it is in fact an agreement feature that marks the person of the verb’s subject (some languages, e.g. Basque, can also mark person of objects). Person marked on verbs makes it unnecessary to always add a personal pronoun as subject and thus subjects are sometimes dropped (pro-drop languages).
0
: zero person
Zero person is for impersonal statements, appears in Finnish as well as in Santa Ana Pueblo Keres. (The construction is distinctive in Finnish but it does not use unique morphology that would necessarily require a feature. However, it is morphologically distinct in Keres (Davis 1964:75): The fourth (zero) person is used “when the subject of the action is obscure, as when the speaker is telling of something that he himself did not observe. It is also used when the subject of the action is inferior to the object, as when an animal is the subject and a human being the object.”
Examples
- [kee] gàku “he (third person) bit him”
- [kee] c̓àku “he (zero/fourth person) bit him”
1
: first person
In singular, the first person refers just to the speaker / author. In plural, it must include the speaker and one or more additional persons. Some languages (e.g. Taiwanese) distinguish inclusive and exclusive 1st person plural pronouns: the former include the addressee of the utterance (i.e. I + you), the latter exclude them (i.e. I + they).
Examples
- [en] I, we
- [cs] dělám “I do”
2
: second person
In singular, the second person refers to the addressee of the utterance / text. In plural, it may mean several addressees and optionally some third persons too.
Examples
- [en] you
- [cs] děláš “you do”
3
: third person
The third person refers to one or more persons that are neither speakers nor addressees.
Examples
- [en] he, she, it, they
- [cs] dělá “he/she/it does”
4
: fourth person
The fourth person can be understood as a third person argument morphologically distinguished from another third person argument, e.g. in Navajo.
Examples
- [kee] gàku “he (third person) bit him”
- [kee] c̓àku “he (zero/fourth person) bit him”
References
- Davis, Irvine. 1964. The language of Santa Ana Pueblo (anthropological papers, no. 69). Smithsonian Institution Bureau of American Ethnology, Bulletin 191: Anthropological Papers, Numbers 68-74, Washington, DC: United States Government Printing Office, 53–190.
Person[abs]
: person agreement with the absolutive argument
Person[abs]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Person[abs]
is the person of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsperson
.Person[erg]
is the person of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergperson
.Person[dat]
is the person of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatperson
.
One may want to use just Person
instead of Person[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Person[abs]
to demonstrate that it is the same layer of agreement for both the features.
1
: first person absolutive argument
Examples: [eu] dakarkiogu Person[erg]=1
2
: second person absolutive argument
Examples: [eu] dakarkiozu Person[erg]=2
3
: third person absolutive argument
Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3
Person[dat]
: person agreement with the dative argument
Person[dat]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Person[abs]
is the person of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsperson
.Person[erg]
is the person of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergperson
.Person[dat]
is the person of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatperson
.
One may want to use just Person
instead of Person[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Person[abs]
to demonstrate that it is the same layer of agreement for both the features.
1
: first person dative argument
Examples: [eu] dakarkiogu Person[erg]=1
2
: second person dative argument
Examples: [eu] dakarkiozu Person[erg]=2
3
: third person dative argument
Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3
Person[erg]
: person agreement with the ergative argument
Person[erg]
Finite verbs in many Indo-European languages agree in person and number with their subject.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Person[abs]
is the person of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsperson
.Person[erg]
is the person of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergperson
.Person[dat]
is the person of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatperson
.
One may want to use just Person
instead of Person[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Person[abs]
to demonstrate that it is the same layer of agreement for both the features.
1
: first person ergative argument
Examples
- [eu] dakarkiogu
Person[erg]=1
- [eu] dizkizuet (lemma *edun)
Number[erg]=Sing|Person[erg]=1
|
Number[abs]=Plur|Person[abs]=3
|
Number[dat]=Plur|Person[dat]=2
- [eu] dizkizugu (lemma *edun)
Number[erg]=Plur|Person[erg]=1
|
Number[abs]=Plur|Person[abs]=3
|
Number[dat]=Sing|Person[dat]=2
2
: second person ergative argument
Examples
- [eu] dakarkiozu
Person[erg]=2
- [eu] dizkidazu (lemma *edun)
Number[erg]=Sing|Person[erg]=2
|
Number[abs]=Plur|Person[abs]=3
|
Number[dat]=Sing|Person[dat]=1
3
: third person ergative argument
Examples
- [eu] zizkigun, dizkigu (lemma *edun)
Number[erg]=Sing|Person[erg]=3
|
Number[abs]=Plur|Person[abs]=3
|
Number[dat]=Plur|Person[dat]=1
- [eu] zizkieten, dizkiete (lemma *edun)
Number[erg]=Plur|Person[erg]=3
|
Number[abs]=Plur|Person[abs]=3
|
Number[dat]=Plur|Person[dat]=3
Person[obj]
: person agreement with object
Person[obj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Person
of the argument,
we have two layers of Person
on the verb: Person[subj]
, and (for transitive verbs) Person[obj]
.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Person[abs]
is the person of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsperson
.Person[erg]
is the person of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergperson
.Person[dat]
is the person of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatperson
.
One may want to use just Person
instead of Person[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Person[abs]
to demonstrate that it is the same layer of agreement for both the features.
1
: first person object
Examples: [eu] dakarkiogu Person[erg]=1
2
: second person object
Examples: [eu] dakarkiozu Person[erg]=2
3
: third person object
Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3
Person[psor]
: possessor’s person
Person[psor] is possessor’s person, marked e.g. on Hungarian nouns. These noun forms would be translated to English as possessive pronoun + noun.
This layered feature is conveniently used for possessive inflections
of nouns, although nouns normally do not have a Person
feature,
meaning that no other layers are needed. Nevertheless, the possessive
morphology typically also includes Number
, which must be multi-layered
on nouns, and we thus have Person[psor]
together with Number[psor]
.
This layered feature is normally not used with possessive pronouns.
They traditionally have just simple Person
.
(And in some languages, possessive pronouns are actually identical to
personal pronouns in the genitive case.)
1
: first person possessor
Examples: [hu] kutya = dog; kutyám = my dog; kutyánk = our dog.
2
: second person possessor
Examples: [hu] kutya = dog; kutyád = your.Sing dog; kutyátok = your.Plur dog.
3
: third person possessor
Examples: [hu] kutya = dog; kutyája = his/her/its dog; kutyájuk = their dog.
Person[subj]
: person agreement with subject
Person[subj]
Finite verbs in many Indo-European languages agree in person and number with their subject.
Some languages in other families are head-marking, which means that the verbal morphology can cross-reference
multiple core arguments, not just the subject. If the cross-reference involves the Person
of the argument,
we have two layers of Person
on the verb: Person[subj]
, and (for transitive verbs) Person[obj]
.
While it would be possible to make the subject layer the default and use just Person
for it,
the explicit labeling of both layers is probably more helpful in such languages, as it can reduce confusion.
In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Person[abs]
is the person of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabsperson
.Person[erg]
is the person of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergperson
.Person[dat]
is the person of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatperson
.
One may want to use just Person
instead of Person[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Person[abs]
to demonstrate that it is the same layer of agreement for both the features.
1
: first person subject
Examples: [eu] dakarkiogu Person[erg]=1
2
: second person subject
Examples: [eu] dakarkiozu Person[erg]=2
3
: third person subject
Examples: [eu] dakarkiogu Person[abs]=3|Person[dat]=3
Polarity
: polarity
Values: | Neg | Pos |
Polarity is typically a feature of verbs,
adjectives, sometimes also adverbs and
nouns in languages that negate using bound
morphemes.
In languages that negate using a function word, Polarity
is used to mark
that function word, unless it is a pro-form already marked with
PronType=Neg
(see below).
Positive polarity (affirmativeness) is rarely, if at all, encoded using overt
morphology. The feature value Polarity=Pos
is usually used to signal that a lemma
has negative forms but this particular form is not negative. Using the feature
in such cases is somewhat optional for words that can be negated but rarely are.
Language-specific documentation should define under which
circumstances the positive polarity is annotated.
In Czech, for instance, all verbs and adjectives can be negated using the prefix ne-.
In English, verbs are negated using the particle not.
English adjectives can be negated with not, or sometimes using prefixes
(wise – unwise, probable – improbable),
although the use of prefixes is less productive than in Czech.
In general, only the most grammatical (as opposed to lexical) forms of
negation should receive Polarity=Neg
.
Note that Polarity=Neg
is not the same thing as
PronType=Neg
. For pronouns and other pronominal parts of speech
there is no such binary opposition as for verbs and adjectives. (There
is no such thing as “affirmative pronoun”.)
The Polarity
feature can be also used to distinguish response
interjections yes and no.
Pos
: positive, affirmative
Examples
- [cs] přišel “he came”
- [cs] velký “big”
- [en] yes
Neg
: negative
Examples
- [cs] nepřišel “he did not come”
- [cs] nevelký “not big”
- [en] not
- [en] nor
- [en] no as in no, I don’t think so; but not as in we have no bananas
Polite
: politeness
Values: | Elev | Form | Humb | Infm |
Various languages have various means to express politeness or respect; some
of the means are morphological. Three to four dimensions of politeness are
distinguished in linguistic literature. The Polite
feature currently covers
(and mixes) two of them; a more elaborate system of feature values may be
devised in future versions of UD if needed. The two axes covered are:
- speaker-referent axis (meant to include the addressee when he happens to be the referent)
- speaker-addressee axis (word forms depend on who is the addressee, although the addressee is not referred to)
Changing pronouns and/or person and/or number of the verb forms when respectable persons are addressed in Indo-European languages belongs to the speaker-referent axis because the honorific pronouns are used to refer to the addressee.
In Czech, formal second person has the same form for singular and plural, and is identical to informal second person plural. This involves both the pronoun and the finite verb but not a participle, which has no special formal form (that is, formal singular is identical to informal singular, not to informal plural).
In German, Spanish or Hindi, both number and person are changed (informal third person is used as formal second person) and in addition, special pronouns are used that only occur in the formal register ([de] Sie; [es] usted, ustedes; [hi] आप āpa).
In Japanese, verbs and other words have polite and informal forms but the polite
forms are not referring to the addressee (they are not in second person). They
are just used because of who the addressee is, even if the topic does not
involve the addressee at all. This kind of polite language is called teineigo (丁寧語)
and belongs to the speaker-addressee axis. Nevertheless, we currently use the
same values for both axes, i.e. Polite=Form
can be used for teineigo too.
This approach may be refined in future.
Infm
: informal register
Usage varies but if the language distinguishes levels of politeness, then the informal register is usually meant for communication with family members and close friends.
Examples
- [cs] ty jdeš / vy jdete (you go.Sing/Plur)
- [de] du gehst / ihr geht (you go.Sing/Plur)
- [es] tú vas / vosotros vais (you go.Sing/Plur)
- [ja] 行かない ikanai (will not go)
Form
: formal register
Usage varies but if the language distinguishes levels of politeness, then the polite register is usually meant for communication with strangers and people of higher social status than the one of the speaker.
Examples
- [cs] vy jdete (you go.Sing/Plur)
- [de] Sie gehen (you go.Sing/Plur)
- [es] usted va / ustedes van (you go.Sing/Plur)
- [ja] 行きません ikimasen (will not go)
Elev
: referent elevating
This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese sonkeigo (尊敬語) is a set of honorific forms that elevate the status of the referent.
Examples
- [ja] なさる nasaru, なさいます nasaimasu (to do; when talking about a customer or a superior)
Humb
: speaker humbling
This register belongs to the speaker-referent axis and can be seen as a subtype of the formal register there. As an example, Japanese kenjōgo (謙譲語) is a set of honorific forms that lower the speaker’s status, thereby raising the referent’s status by comparison.
Examples
- [ja] いたす itasu, いたします itashimasu (to do; when referring to one’s own actions or the actions of a group member)
References
- Brown, Penelope and Stephen C. Levinson. 1987. Politeness: Some Universals in Language Usage. Studies in Interactional Sociolinguistics, Cambridge, UK: Cambridge University Press.
- Comrie, Bernard. 1976. Linguistic politeness axes: Speaker-addressee, speaker-referent, speaker-bystander. Pragmatics Microfiche 1.7(A3). Department of Linguistics, University of Cambridge.
- Wenger, James R. 1982. Some Universals of Honorific Language with Special Reference to Japanese. Ph.D. thesis, University of Arizona, Tucson, AZ.
Polite[abs]
: politeness agreement with absolutive argument
Polite[abs]
Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Polite[abs]
is the politeness of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabspoliteness
.Polite[erg]
is the politeness of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergpoliteness
.Polite[dat]
is the politeness of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatpoliteness
.
One may want to use just Polite
instead of Polite[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Polite[abs]
to demonstrate that it is the same layer of agreement for both the features.
Infm
: informal absolutive argument
Examples: [eu] ezan, ezak Polite[erg]=Inf
Form
: polite, formal absolutive argument
Examples: [eu] ezazu Polite[erg]=Pol
(politeness-neutral form is ezazue)
Polite[dat]
: politeness agreement with dative argument
Polite[dat]
Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Polite[abs]
is the politeness of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabspoliteness
.Polite[erg]
is the politeness of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergpoliteness
.Polite[dat]
is the politeness of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatpoliteness
.
One may want to use just Polite
instead of Polite[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Polite[abs]
to demonstrate that it is the same layer of agreement for both the features.
Infm
: informal dative argument
Examples: [eu] ezan, ezak Polite[erg]=Inf
Form
: polite, formal dative argument
Examples: [eu] ezazu Polite[erg]=Pol
(politeness-neutral form is ezazue)
Polite[erg]
: politeness agreement with ergative argument
Polite[erg]
Finite verbs in many Indo-European languages agree in person and number with their subject; for the second person this also affects the politeness register. In Basque (a polypersonal language), certain verbs overtly mark agreement with up to three arguments: one in the absolutive case, one in ergative and one in dative. Thus in dakarkiogu “we bring it to him/her”, akar is the stem (ekarri = “bring”), d stands for “it” (absolutive argument is the direct object of transitive verbs), ki stands for the dative case, o stands for “he” and gu stands for “we” (ergative argument is the subject of transitive verbs).
Polite[abs]
is the politeness of the absolutive argument of the verb. The corresponding feature in Interset 2.041 is calledabspoliteness
.Polite[erg]
is the politeness of the ergative argument of the verb. The corresponding feature in Interset 2.041 is calledergpoliteness
.Polite[dat]
is the politeness of the dative argument of the verb. The corresponding feature in Interset 2.041 is calleddatpoliteness
.
One may want to use just Polite
instead of Polite[abs]
.
However, there are two issues with that (at least in Basque).
First, the absolutive argument is not always the subject. For transitive verbs, it is the object, so the parallelism with nominative-accusative languages would be weak anyway.
Second, we cannot avoid Number[abs]
(both Number
and Number[abs]
can occur at one word)
and thus we keep Polite[abs]
to demonstrate that it is the same layer of agreement for both the features.
Infm
: informal ergative argument
Examples: [eu] ezan, ezak Polite[erg]=Inf
Form
: polite, formal ergative argument
Examples: [eu] ezazu Polite[erg]=Pol
(politeness-neutral form is ezazue)
Poss
: possessive
Values: | Yes |
Boolean feature of pronouns, determiners or adjectives. It tells whether the word is possessive.
While many tagsets would have “possessive” as one of the various pronoun types, this feature is intentionally separate from PronType, as it is orthogonal to pronominal types. Several of the pronominal types can be optionally possessive, and adjectives can too.
Yes
: it is possessive
Note that there is no No
value. If the word is not possessive, the
Poss
feature will just not be mentioned in the FEAT
column. (Which
means that empty value has the No
meaning.)
Examples
- [en] my, your, his, mine, yours, whose
- [cs] possessive determiners: můj, tvůj, jeho, její, náš, váš, svůj, čí, jejichž
- [cs] possessive adjectives: otcův “father’s”, matčin “mother’s”
PrepCase
: case form sensitive to prepositions
Personal pronouns in some languages have different forms depending on whether they are objects of prepositions or not. For instance, Czech on (he) without prepositions has the forms jemu/DAT, jeho/ACC, jím/INS, while with a preposition it is němu/DAT, něho/ACC, ním/INS. Similarly, Portuguese pronouns in prepositional oblique case take forms different from oblique pronouns serving as direct objects of verbs: eu/NOM (I), me/ACC (give me that), mim/PREP-ACC (come to me).
Default empty value means that the word form is neutral w.r.t. prepositions.
Npr
: non-prepositional case
This word form must not be used after a preposition.
Examples: [cs] jemu “him” (dative)
Pre
: prepositional case
This word form must be used after a preposition.
Examples: [cs] k němu “to him” (dative)
PronType
: pronominal type
Values: | Art | Dem | Emp | Exc | Ind | Int | Neg | Prs | Rcp | Rel | Tot |
This feature typically applies to pronouns, pronominal adjectives (determiners), pronominal numerals (quantifiers) and pronominal adverbs.
Prs
: personal or possessive personal pronoun or determiner
See also the Poss feature that distinguishes normal personal
pronouns from possessives. Note that Prs
also includes reflexive
personal/possessive pronouns (e.g. [cs] se / svůj; see the
Reflex feature).
Examples
- [en] I, you, he, she, it, we, they, my, your, his, her, its, our, their, mine, yours, hers, ours, theirs
- [cs] já, ty, on, ona, ono, my, vy, oni, ony, se, můj, tvůj, jeho, její, náš, váš, jejich, svůj
Rcp
: reciprocal pronoun
This value is used for pronouns that are specifically reciprocal. If a reflexive pronoun can be used to convey reciprocal meaning,
it is still labeled as reflexive (PronType=Prs|Reflex=Yes
). It is not marked as reciprocal in contexts in which it is used
reciprocally.
Reciprocal means that there is a plural subject and every member of the group does the thing described by the predicate to every other member of the group. A reciprocal pronoun is used in the object position to signal such configuration.
Examples
- [de] einander “each other”
- [da] hinanden “each other”
Art
: article
Article is a special case of determiner that bears the feature of definiteness (in other languages, the feature may be marked directly on nouns).
Examples
- [en] a, an, the
- [de] ein, eine, der, die, das
- [es] un, una, el, la
Int
: interrogative pronoun, determiner, numeral or adverb
Note that possessive interrogative determiners (whose) can be distinguished by the Poss feature.
Examples:
- [cs/en] kdo / who, co / what, který / which, čí / whose, kolik / how many, how much, kolikátý / how-maniest (ordinal quantifier), kolikrát / how many times, kde / where, kam / where to, kdy / when, jak / how, proč / why
Rel
: relative pronoun, determiner, numeral or adverb
Note that in many languages this class heavily overlaps with interrogatives, yet there are pronouns that are only relative, and in some languages (Bulgarian, Hindi) the two classes are distinct.
Examples:
- [cs] jenž, což “which”, “that” (relative but not interrogative pronouns); jehož “whose” (possessive relative pronoun)
Exc
: exclamative determiner
Exclamative pro-adjectives (determiners) express the speaker’s surprise towards the modified noun, e.g. what in “What a surprise!” In many languages, exclamative determiners are recruited from the set of interrogative determiners. Therefore, not all tagsets distinguish them.
Examples:
- [it] che
- [cs] jaký as in “Jaké překvapení!”
- [en] what as in “What a surprise!”
Dem
: demonstrative pronoun, determiner, numeral or adverb
These are often parallel to interrogatives. Some tagsets might also distinguish a separate feature of distance (here / there; [es] aquí / ahí / allí).
Examples
- [cs/en] tento / this, tamten / that, takový / such, týž / same, tolik / so much, tolikátý / so-maniest (ordinal number), tolikrát / so many times, tady / here, tam / there, teď / now, tehdy / then, tak / so
Emp
: emphatic determiner
Emphatic pro-adjectives (determiners) emphasize the nominal they depend on. There are similarities with reflexive and demonstrative pronouns / determiners.
Examples
- [ro] însuși
- [cs] sám
- [en] himself as in “He himself did it.”
Tot
: total (collective) pronoun, determiner or adverb
Examples
- [cs/en] každý / every, everybody, everyone, each, všechno / everything, all, všude / everywhere, vždy / always
Neg
: negative pronoun, determiner or adverb
Negative pronominal words are distinguished from negating particles
and from words that inflect for polarity (verbs, adjectives etc.) Those words
do not use PronType=Neg
, they use Polarity=Neg
instead. See the
Polarity feature for further details.
Examples:
- [cs/en] nikdo / nobody, nic / nothing, nijaký / no, ničí / no one’s (possessive negative determiner), žádný / no, none, nikde / nowhere, nikdy / never, nijak / no way (lit. “no-how”)
Ind
: indefinite pronoun, determiner, numeral or adverb
Note that some tagsets might further subclassify this category to distinguish “some” from “any” etc. Such distinctions are not part of universal features but may be added in language-specific extensions.
Examples
- [cs/en] někdo / somebody, něco / something, některý / some, něčí / someone’s (possessive indefinite pronoun), několik / a few, several (indefinite numeral/quantifier), několikátý / “a fewth”, “severalth” (indefinite ordinal numeral), několikrát / a few times, several times, někde / somewhere, někdy / sometimes, nějak / somehow
- [cs/en] kdokoli / anybody, cokoli / anything, kterýkoli / any, číkoli / anyone’s (possessive indefinite pronoun), kdekoli / anywhere, kdykoli / any time, jakkoli / anyhow
- [cs/en] málokdo / few people, leckdo / quite a few people, kdosi / somebody…
PunctSide
: which side of paired punctuation is this?
Distinguishes between initial and final form of pairwise punctuation (brackets, quotation marks, question and exclamation in Spanish). Note that “initial” and “final” are better terms than “left” and “right”. The latter would be confusing in languages writing from right to left, like Arabic.
Ini
: initial (left bracket in English texts)
Examples
- [is] „gríðarlegan fjölda“ “a huge number”
Fin
: final (right bracket in English texts)
Examples
- [is] „gríðarlegan fjölda“ “a huge number”
PunctType
: punctuation type
Values: | Brck | Colo | Comm | Dash | Elip | Excl | Peri | Qest | Quot | Semi | Slsh |
Many tagsets have just one tag for punctuation. Others classify punctuation in more detail.
Peri
: period at the end of sentence or clause
Examples
- [es] ¿Por qué? -, se pregunta. “Why? – she wonders.”
Elip
: ellipsis
Examples
- [pl] Nie wiem, dlaczego ją wybrałem… “I don’t know why I chose her …”
Qest
: question mark
Examples
- [es] ¿Por qué? -, se pregunta. “Why? – she wonders.”
Excl
: exclamation mark
Examples
- [es] ¡Fijese en la lectura racista de la Sirenita!. “Notice the racist reading of the Little Mermaid!”
Quot
: quotation marks (various sorts in various languages)
Examples
- [es] El experto reconoció que estaba ”preocupado” “The expert acknowledged that he was “concerned””
Brck
: bracket
Examples
- [es] Ejército del Sur del Líbano (ELS) “South Lebanon Army (SLA)”
Comm
: comma
Examples
- [es] ¿Por qué? -, se pregunta. “Why? – she wonders.”
Colo
: colon
Examples
- [es] La mejor defensa: La postura española. “The best defense: The Spanish position.”
Semi
: semicolon
Examples
- [es] Las distancias son largas; las esperas, desesperantes. “The distances are long; the waiting desperate.”
Dash
: dash, hyphen
Examples
- [es] ¿Por qué? -, se pregunta. “Why? – she wonders.”
Slsh
: slash or backslash
Examples
- [pl] Zaczynała w RSC w połowie lat 60., grając pacjentkę szpitala dla obłąkanych w „Marat/Sade”. “She started out at the RSC in the mid-Sixties playing an asylum-inmate in Marat/Sade.”
Reflex
: reflexive
Values: | Yes |
Boolean feature, typically of pronouns or determiners. It tells whether the word is reflexive, i.e. refers to the subject of its clause.
While many tagsets would have “reflexive” as one of the various
pronoun types, this feature is intentionally separate from
PronType.
When used with pronouns and determiners, it should be combined
with PronType=Prs
, regardless whether they really distinguish
the Person feature (in some languages they do, in others they
do not).
Note that forms that are canonically reflexive sometimes have other functions in the
language, too. The feature Reflex=Yes
denotes the word type, not its actual function
in context (which can be distinguished by dependency relation types). Hence the feature
is not restricted to situations where the word is used truly reflexively.
For example, reflexive clitics in European languages often have a wide array of possible
functions (middle, passive, inchoative,
impersonal, or even as a lexical morpheme).
Besides that, reflexives in some languages are also used for emphasis (while other languages
have separate emphatic pronouns), and in some languages they signal reciprocity (while other
languages have separate reciprocal pronouns).
Using Reflex=Yes
with all of them has the benefit that they can be easily identified
(however, if it is possible for the annotators to distinguish contexts where a reflexive
pronoun is used reciprocally or emphatically, it is possible to combine Reflex=Yes
with
PronType=Rcp
or PronType=Emp
, instead of PronType=Prs
).
Note that while some languages also have reflexive verbs, these are in
fact fused verbs with reflexive pronouns, as in Spanish despertarse
or Russian проснуться (both meaning “to wake up”). Thus in these
cases the fused token will be split to two syntactic words, one of
them being a reflexive pronoun. In languages where the reflexive pronoun
is not split, it may be more appropriate to mark the verb as the middle Voice
than using Reflex=Yes
with the verb.
Yes
: it is reflexive
Note that there is no No
value. If the word is not reflexive, the
Reflex
feature will just not be mentioned in the FEAT
column. (Which means that empty value has the No
meaning.)
Examples
- [cs] reflexive personal pronouns: se, si; reflexive possessive pronoun: svůj
Style
: style or sublanguage to which this word form belongs
Values: | Arch | Coll | Expr | Form | Rare | Slng | Vrnc | Vulg |
This may be a lexical feature (some words-lemmas are archaic, some are colloquial) or a morphological feature (inflectional patterns may systematically change between dialects or styles). English pronouns offer a useful case study: thou is archaic; whom is often somewhat formal; ya is colloquial, used in a casual/familiar way (See ya!); y’all is vernacular (especially associated with certain regions); and wtf is arguably an expressive variant of the pronoun what in contexts where a nominal is required (Wtf are you doing?!).
Besides real morphology, the choices that make a particular word form belong to a different style may also be orthographic.
This feature could be used in many languages but only a few choose to actually annotate it. Seen in Bulgarian, Czech, Danish, English, Finnish and Hungarian.
Arch
: archaic, obsolete
This value should be used if it is desirable in a language to mark archaic lexemes or archaic morphological forms.
Language-specific guidelines must define what exactly it means to be archaic. Note that there are theoretical
problems, especially if we want to annotate diachronic corpora with various stages of the language. There is only
one set of guidelines per language, which should accommodate all stages and genres. It would be unfortunate if
most words in older texts had to be labeled as Style=Arch
. Hence, the only useful application of the feature is
probably for words that were already archaic at the time of production of the text.
Examples
- [en] Thou shalt not kill. (The modern equivalent would be You shall not kill.)
Rare
: rare
Examples
- [cs] Co ale dohnalo mladého ambiciosního člověka k sebevraždě? “But what drove the young, ambitious man to commit suicide?” (The more frequent equivalent would be ambiciózního.)
Form
: formal, literary
Examples
- [da] Det vil hindre mange misforståelser mellem vore to partier. “It will prevent many misunderstandings between our two parties.”
Coll
: colloquial
Examples
- [cs] Pojedete do zahraničí s cestovkou? “Are you going abroad with a travel agency?” (The more formal equivalent would be cestovní kanceláří.)
Vrnc
: vernacular
Examples
- [cs] A tak jsem po čase dělal kmotra nové knize: “Slovácko sa nesúdí“. “And so, over time, I made the godfather of a new book: “Slovácko sa nesúdí” (“Slovácko does not judge”).” (This is an East-Moravian dialect of Czech; its standard equivalent would be se nesoudí.)
Slng
: slang
Examples
- [cs] Superdobrý kšeft ovlivňuje jednoznačně počasí. “The super-good business is clearly affected by the weather.” (A more neutral equivalent would be obchod, výnos, výdělek.)
Expr
: expressive, emotional
This indicates a distinctive morphological or spelling choice for added expressiveness (with respect to pronunciation or meaning).
In the case of an expressive spelling variant, this feature should be paired with a CorrectForm
in the MISC column, as explained in the page on typos.
Compare the Typo feature, which covers errors and typographical unexpectedness.
Examples
- [cs] Vezeme také několik set čokoládiček. “We also take several hundred chocolates.” (The diminutive signals affection rather than size. The neutral equivalent would be čokolád.)
- [en] Kinds of expressive spelling variation include: expressive lengthening (niiiiice), dialectal or colloquial pronunciation (Hahvahd), censored characters (sh*t), symbolic characters (CA$H), etc. As CA$H defies typographical convention it should also be labeled Typo
=Yes
.
Vulg
: vulgar
Examples
- [cs] Doporučuji vrátit parchanta do košíčku, postrčit po vodě a na hloubce ho převrhnout. “I recommend returning the bastard to the basket, pushing it over the water and overturning it at depth.”
Subcat
: subcategorization
Values: | Ditr | Indir | Intr | Tran |
Lexical feature of verbs. Some tagsets distinguish intransitive and transitive verbs. In many languages however, subcategorization of verbs is much more complex than this.
Intr
: intransitive verb
A verb that does not take arguments other than the subject.
Examples
- [en] to go
Indir
: indirect verb
A verb that does not require a direct object but it requires an oblique argument.
Examples
- [en] to rely on something
Tran
: transitive verb
A verb that takes a direct (accusative) object as argument (in addition to the subject). These verbs can be passivized, then the direct object becomes the subject.
Examples
- [en] to do something, to be done by somebody
Ditr
: ditransitive verb
A verb that takes two core objects as arguments (in addition to the subject). These verbs can be passivized.
Examples
- [en] to give somebody something, to be given something by somebody
Tense
: tense
Values: | Fut | Imp | Past | Pqp | Pres |
Tense is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as participles are classified as verbs or as the other category.
Tense is a feature that specifies the time when the action took / takes / will take place, in relation to a reference point. The reference is often the moment of producing the sentence, but it can be also another event in the context. In some languages (e.g. English), some tenses are actually combinations of tense and aspect. In other languages (e.g. Czech), aspect and tense are separate, although not completely independent of each other.
Note that we are defining features that apply to a single word. If a
tense is constructed periphrastically (two or more words,
e.g. auxiliary verb indicative + participle of the main verb) and none
of the participating words are specific to this tense, then the
features will probably not directly reveal the tense. For instance,
[en] I had been there is past perfect (pluperfect) tense,
formed periphrastically by the simple past tense of the auxiliary to
have and the past participle of the main verb to be. The auxiliary
will be tagged VerbForm=Fin|Mood=Ind|Tense=Past
and the participle
will have VerbForm=Part|Tense=Past
; none of the two will have
Tense=Pqp
. On the other hand, Portuguese can form the pluperfect
morphologically as just one word, such as estivera, which will thus be tagged
VerbForm=Fin|Mood=Ind|Tense=Pqp
.
Past
: past tense / preterite / aorist
The past tense denotes actions that happened before a reference point.
In the prototypical case, the reference point is the moment of producing
the sentence and the past event happened before the speaker speaks about
it. However, Tense=Past
is also used to distinguish past participles
from other kinds of participles, and past converbs from other kinds
of converbs; in these cases, the reference point may itself be in past
or future, when compared to the moment of speaking. For instance, the
Czech converb spatřivše “having seen” in the sentence
spatřivše vojáky, velmi se ulekli
“having seen the soldiers, they got very scared”
describes an event that is anterior to the event of getting scared.
It also happens to be anterior to the moment of speaking, but that fact
is not encoded in the converb itself, it is rather a consequence of
“getting scared” being in the past tense.
Among finite forms,
the simple past in English is an example of Tense=Past
.
In German, this is the Präteritum.
In Turkish, this is the non-narrative past.
In Bulgarian, this is aorist, the aspect-neutral past tense that can be
used freely with both imperfective and perfective verbs (see also
imperfect).
Examples
- [en] he went home
- [en] he has gone home
Pres
: present / non-past tense / aorist
The present tense denotes actions that are in progress (or states that
are valid) in a reference point; it may also describe events that usually
happen.
In the prototypical case, the reference point is the moment of producing
the sentence; however, Tense=Pres
is also used to distinguish present
participles from other kinds of participles, and present converbs from
other kinds of converbs. In these cases, the reference point may be in
past or future when compared to the moment of speaking. For instance,
the English present participle may be used to form a past progressive tense:
he was watching TV when I arrived.
Some languages (e.g. Uralic) only distinguish past vs. non-past morphologically,
and then Tense=Pres
can be used to represent the non-past form.
(In some grammar descriptions, e.g. Turkic or Mongolic, this non-past
form may be termed aorist, but note that in other languages the term
is actually used for a past tense, as noted above. Therefore the term
is better avoided in UD annotation.)
Similarly, some Slavic languages (e.g. Czech), although they do
distinguish the future tense, nevertheless have a subset of verbs
where the morphologically present form has actually a future meaning.
Examples
- [en] he goes home
- [en] he was going home
Fut
: future tense
The future tense denotes actions that will happen after a reference point; in the prototypical case, the reference point is the moment of producing the sentence.
Examples
- [es] irá a la casa “he/she/it will go home”
Imp
: imperfect
Used in e.g. Bulgarian and Croatian, imperfect is a special case of the past tense. Note that, unfortunately, imperfect tense is not always the same as past tense + imperfective aspect. For instance, in Bulgarian, there is lexical aspect, inherent in verb meaning, and grammatical aspect, which does not necessarily always match the lexical one. In main clauses, imperfective verbs can have imperfect tense and perfective verbs have perfect tense. However, both rules can be violated in embedded clauses.
Examples
- [bg] тя оставаше, където той и да отидеше / tja ostavaše, kădeto toj i da otideše “it remained where he left it”
Pqp
: pluperfect
The pluperfect denotes action that happened before another action in past. This value does not apply to English where the pluperfect (past perfect) is constructed analytically. It applies e.g. to Portuguese.
Examples
- [pt] afirmou que os sequestradores já ligaram “he said that the kidnappers had already called”
Typo
: is this a misspelled word?
Values: | Yes |
Indicates an erroneous or typographically unexpected word form.
Most unexpected spellings are typographical errors (inadvertent on the part of the author). Also unexpected: creatively using special characters or spaces for visual effect; or unusual character encoding. For transcribed speech, no distinction is made between the original speaker and the transcriber, so a mispronunciation like shilly for silly is also treated like a typo. This feature can also encompass clear errors in word choice, such as learner errors and dysfluencies (e.g. lesser where fewer is appropriate, or eats instead of eat).
Note that “typographically unexpected” is interpreted in the context of the genre. Abbreviations or popular informal spellings are not necessarily unexpected. See Abbr.
Superfluous word-internal spaces are addressed using the goeswith relation to connect parts of the word.
Typo=Yes
should be used with the goeswith
head (and this is enforced by validation for treebanks that use features).
The correct spelling can be indicated in the MISC column with the CorrectForm
feature,
as discussed in the page on typos.
Capitalization, etc.:
Cases where an unexpected form of a letter is used within a word—e.g., unexpected capitalization choices—should be handled on a language- and treebank-specific basis. In a social media treebank, for example, it may not be practical to flag all nonstandard capitalization choices as Typo=Yes
given the wide variability of capitalization in unedited writing.
Stylistic choices:
Typo=Yes
is intended for specifically orthographic unexpectedness, not unexpected word variants in general. If the author is taken to be signaling an intentionally modified pronunciation of a word, inventing a new word, or making a pun, that is not Typo if the unexpectedness is reflected phonologically.
The optional Style feature may be useful in such cases.
Deliberate, well-established conventions of altering the written forms of words, e.g. censoring profanity with nonalphabetic symbols, should also be considered expressive stylistic choices rather than typographical unexpectedness.
Extra words:
For extra or missing words, see the policy on errors.
A valid word that is superfluous in the sentence and attached as reparandum does not receive Typo=Yes
.
Yes
: it is typo
Examples
- [en] Barak Obama
VerbForm
: form of verb or deverbative
Values: | Conv | Fin | Gdv | Ger | Inf | Part | Sup | Vnoun |
Even though the name of the feature seems to suggest that it is used
exclusively with verbs, it is not the case. Some verb
forms in some languages actually form a gray zone between verbs and
other parts of speech (nouns, adjectives
and adverbs). For instance, participles may be either
classified as verbs or as adjectives, depending on language and
context. In both cases VerbForm=Part
may be used to separate them
from other verb forms or other types of adjectives.
Fin
: finite verb
Rule of thumb: if it has non-empty Mood, it is finite. But beware that some tagsets conflate verb forms and moods into one feature.
Examples
- [en] I do, he does
Inf
: infinitive
Infinitive is the citation form of verbs in many languages. Unlike in English, it often has morphological form that is distinct from the finite forms. Infinitives may be used together with auxiliaries to form periphrastic tenses (e.g. future tense [cs] budu sedět v letadle “I will sit in a plane”), they appear as arguments of modal verbs etc. In some languages, e.g. in Hindi, they behave similarly to nouns and are used as such (similar to the gerund in English). Nevertheless, this observation is not universal and, e.g. in Slavic languages, infinitives are quite distinct from verbal nouns.
Examples
- [de] ich muss gehen “I must go”
- [pt] eu preciso ir “I must go”
Sup
: supine
Supine is a rare verb form. It survives in some Slavic languages (Slovenian) and is used instead of infinitive as the argument of motion verbs (old [cs] jdu spat lit. I-go sleep).
A form called “supine” also exists in Swedish where it is a special form of the participle, used to form the composite past form of a verb. It is used after the auxiliary verb ha (to have) but not after vara (to be):
Examples
- [sv] Simple past: I ate (the) dinner = Jag åt maten (using preterite)
- [sv] Composite past: I have eaten (the) dinner = Jag har ätit maten (using supine)
- [sv] Past participle common: (The) dinner is eaten = Maten är äten (using past participle)
- [sv] Past participle neuter: (The) apple is eaten = Äpplet är ätet
- [sv] Past participle plural: (The) apples are eaten = Äpplena är ätna
Part
: participle, verbal adjective
Participle is a non-finite verb form that shares properties of verbs and adjectives. Its usage varies across languages. It may be used to form various periphrastic verb forms such as complex tenses and passives; it may be also used purely adjectively.
Other features may help to distinguish past/present participles (English), active/passive participles (Czech), imperfect/perfect participles (Hindi) etc.
Examples
- [en] he could have been prepared if he had forseen it; I will be driving home.
Conv
: converb, transgressive, adverbial participle, verbal adverb
The converb, also called adverbial participle or transgressive, is a non-finite verb form that shares properties of verbs and adverbs. It appears e.g. in Slavic and Indo-Aryan languages.
Note that this value was called Trans
in UD v1 and it has been renamed Conv
in UD v2.
Examples
- [cs] zírali na mne, pevně svírajíce své zbraně “they stared at me while gripping their guns firmly”; udělavši večeři, zavolala rodinu ke stolu “having prepared the dinner, she called her family to the table”
Gdv
: gerundive
Used in Latin and Ancient Greek. Not to confuse with gerund.
Examples
- [la] puer laudandus est “the boy should be praised”
Ger
: gerund
Using VerbForm=Ger
is discouraged and alternatives should be considered first
because the term gerund is rather confusing: the English gerund is a verbal
noun or a converb, and it shares the morphological form with present participle
(which may mean that the tagset will not distinguish it from the participle);
the gerundio in Spanish and other Romance languages shows some similarities
with present participles and with converbs, but not with verbal nouns; likewise,
some Slavists use the English term gerund to denote converbs (adverbial
participles), which should be labeled VerbForm=Conv
; and UD version 1
recommended (inspired by English) to use it for verbal nouns, which in UD v2
should use VerbForm=Vnoun
.
However, the feature is still available in UDv2 and can be used if the alternatives do not seem acceptable. The feature may be removed in future versions but comprehensive investigation has to be done first.
Examples
- [en] I look forward to seeing you; he turns a blind eye to my being late
Vnoun
: verbal noun, masdar
Verbal nouns other than infinitives. Also called masdars by some authors, e.g. Haspelmath, 1995.
While in some languages verbal noun and infinitive may be two labels for the same category (and then the language-specific documentation must specify which label should be used), in other languages these categories are distinct. For example, most Slavic languages have infinitive as a specific, uninflected form of the verb, and they also have derived verbal nouns, which behave much like ordinary nouns, have a noun-like distribution (different from infinitives), and inflect for case and number.
Examples
- [cs] dělání “doing”
References
- Haspelmath, Martin. 1995. The converb as a cross-linguistically valid category. Converbs in Cross-Linguistic Perspective: Structure and Meaning of Adverbial Verb Forms – Adverbial Participles, Gerunds –, edited by Martin Haspelmath and Ekkehard König, Berlin: Mouton de Gruyter, Empirical Approaches to Language Typology, 1–56.
VerbType
: verb type
Values: | Aux | Cop | Mod | Light | Quasi |
We already split auxiliary and non-auxiliary verbs at the level of UPOS tags.
The VerbType
feature may be used to capture finer distinctions that some
tagsets make.
Aux
: auxiliary verb
Verb used to create periphrastic verb forms (tenses, passives etc.) In many languages there will be ambiguity between auxiliary and other usages, thus the same verb should get different feature values depending on context.
Examples
- [af] Dit het tot ‘n verenigde staat gelei. “This has led to a united state.”
Cop
: copula verb
Verb used to make nominal predicates from adjectives, nouns or participles. Some languages omit the copula or use other means to create nominal predicates. In languages that have copula, it is often the equivalent of the verb “to be”.
Examples
- [en] It is purple.
Mod
: modal verb
A group of verbs traditionally distinguished in grammars of some languages. They take infinitive of another verb as argument (with or without infinitive-marking conjunction, in languages that have it) and add various modes of possibility, necessity etc. to the meaning of the infinitive. There are other verbs that take infinitives as arguments but they are not considered modal (e.g. phasal verbs such as “to begin to do something”). The set of modal verbs for a language is closed and can be enumerated. Depending on language-internal considerations, modal verbs may be considered a subset of auxiliaries (AUX) or non-auxiliary verbs (VERB).
Note that some languages (e.g. Turkish) use special forms of the main verb instead of combining it with a modal verb.
Examples
- [de] dürfen “may”, können “can”, mögen “want/like”, müssen “must”, sollen “shall”, wollen “want”, wissen “know (how to)”
- [cs] muset “must”, mít “shall, have (to)”, moci “can”, smět “may, be allowed (to)”, umět “know (how to)”, chtít “want”
Light
: light (support) verb
Light or support verb is used in verbo-nominal constructions where the main part of the meaning is contributed by a noun complement but it is not just a nominal predicate with a copula. An English example would be to take a nap, where take is the light verb. It is often the case that the light verb can also function as a normal verb in the language (cf. to take two dollars). If the light verb constructions are used frequently in a language (e.g. Hindi or Japanese) or if there is a dedicated light verb that cannot be used as normal verb, it makes sense to mark light verbs with a dedicated feature value.
Examples
- [ja] する / suru “do”
Quasi
: quasi-verb
A word that functions partially as a verb and is tagged VERB, yet it is defective in some other aspect that are typical of verbs in the given language. For example, quasi-verbs in Polish function as predicates and take infinitives of regular verbs as complements, yet their morphology is not verbal: they are more like frozen forms of adjectives.
Examples
- [pl] można “possible”, trzeba “necessary”, warto “worth”
Voice
: voice
Values: | Act | Antip | Bfoc | Cau | Dir | Inv | Lfoc | Mid | Pass | Rcp |
Voice is typically a feature of verbs. It may also occur with other parts of speech (nouns, adjectives, adverbs), depending on whether borderline word forms such as gerunds and participles are classified as verbs or as the other category.
For Indo-European speakers, voice means mainly the active-passive distinction. In other languages, other shades of verb meaning are categorized as voice.
Act
: active or actor-focus voice
The subject of the verb is the doer of the action (agent), the object is affected by the action (patient). This label is also used for the actor-focus voice of Austronesian languages.
Examples
- [cs] Napadli jsme nepřítele. “We attacked the enemy” (the active participle napadli can be used to form either past tense or conditional mood; here it forms the past tense.)
- [grc] λύει τὸν ἵππον μου (luei ton hippon mou) “he frees my horse”
- [hu] mos “wash”
- [tr] Barış Filiz’i öptü. “Barış kissed Filiz.”
- [tl] Naglilinis siya ng bahay. “He/she cleans a/the house.”
- [yii] Waguɖaŋgu guda:ga wawa:l. “The man saw the dog.” (lit. man-ERG dog.ABS see.ACT-PAST)
Mid
: middle voice
Between active and passive, needed e.g. in Ancient Greek or Sanskrit. The subject is both doer and undergoer in a sense: he is acting upon himself.
Examples
- [grc] λύομαι τὸν ἵππον (luomai ton hippon) “I free (my own) horse”
(source)
Rcp
: reciprocal voice
In a plural subject, all members are doers and undergoers, acting upon each other.
Examples
- [tr] Filiz ve Barış öpüştüler. “Filiz and Barış kissed.”
Pass
: passive or patient-focus voice
The subject of the verb is affected by the action (patient). The doer (agent) is either unexpressed or it appears as an oblique dependent or an object of the verb. This label is also used for the patient-focus voice of Austronesian languages. Note the subtyped dependency relations nsubj:pass, csubj:pass, expl:pass, and aux:pass for analytic components of passive constructions.
Examples
- [cs] Jsme napadeni nepřítelem. “We are attacked by the enemy” (the passive participle napadeni is used to form passive in all tenses; here it forms the present passive.)
- [tl] Nililinis niya ang bahay. “He/she cleans the house.”
Antip
: antipassive voice
In ergative-absolutive languages, the absolutive P argument is demoted to an oblique dependent and the ergative A argument takes the absolutive form, thus transforming a transitive clause into intransitive.
Examples
- [yii] Wagu:ɖa gudaganda wawa:ɖiɲu. “The man saw the dog.” (lit. man.ABS dog-DAT see-ANTIP-PAST)
Lfoc
: location-focus voice
The subject of the verb indicates location or direction, while the doer and the undergoer/theme are coded as objects.
Examples
- [tl] Aalisan ng babae ng bigas ang sako para sa bata. “A/the woman will take some rice out of the sack for a/the child.”
Bfoc
: beneficiary-focus voice
The subject of the verb indicates the beneficiary, while the doer and the undergoer/theme are coded as objects.
Examples
- [tl] Ipagaalis ng babae ng bigas sa sako ang bata. “A/the woman will take some rice out of a/the sack for the child.”
Dir
: direct voice
Used in direct-inverse voice systems, e.g. in Algonquian languages of North America. Direct means that the argument that is higher in salience hierarchy is the subject. Example hierarchy: human 1st person – 2nd – 3rd – non-human animate – inanimate.
Examples
- [crk] Niwīcihānānak. “We help them.” (ni-wīcih-ā-nān-ak lit. 1PL[subj]-help-DIR-3[obj]-PL[obj])
Inv
: inverse voice
Used in direct-inverse voice systems, e.g. in Algonquian languages of North America. Inverse voice marking means that the argument lower in the hierarchy functions as subject.
Examples
- [crk] Niwīcihikonānak. “They help us.” (ni-wīcih-iko-nān-ak lit. 1PL[subj]-help-INV-3[obj]-PL[obj])
Cau
: causative voice
Causative forms of verbs are classified as a voice category because, when compared to the basic active form, they change the number of participants and their mapping on semantic roles. (See, e.g., the documentation of the METU Sabanci treebank (page 26).) Note that this is a feature of verbs. There are languages that have also the causative case of nouns.
Examples
- [hu] mosat “make somebody wash”
- [tr] karıştırıyor “is confusing” (= is causing somebody to be confused)