Layered universal features
In some languages, some features are marked more than once on the same word. We say that there are several layers of the feature. The exact meaning of individual layers is language-dependent.
For example, possessive adjectives, determiners and pronouns may have two different values of u-feat/Gender and two of u-feat/Number. One of the values is determined by agreement with the modified (possessed) noun. This is parallel to other (non-possessive) adjectives and determiners that agree in gender and number with the nouns they modify. The other value is determined lexically because it is a property of the possessor. The following table shows that English distinguishes only the possessor’s gender and number; Hindi distinguishes gender in agreement and number both in agreement and of the possessor (there is no neuter gender in Hindi); German distinguishes both features in both dimensions (more differences would be seen if we also showed German dative and accusative forms, not just nominatives).
Possessor / Agreement | Sing Masc | Sing Fem | Sing Neut | Plur Masc | Plur Fem | |
Sing Masc | [en] [de] [hi] |
his son sein Sohn usakā bēṭā |
his daughter seine Tochter usakī bēṭī |
his house sein Haus |
his sons seine Söhne usakē bēṭē |
his daughters seine Töchter usakī bēṭiyām̐ |
Sing Fem | [en] [de] [hi] |
her son ihr Sohn usakā bēṭā |
her daughter ihre Tochter usakī bēṭī |
her house ihr Haus |
her sons ihre Söhne usakē bēṭē |
her daughters ihre Töchter usakī bēṭiyām̐ |
Sing Neut | [en] [de] |
its son sein Sohn |
its daughter seine Tochter |
its house sein Haus |
its sons seine Söhne |
its daughters seine Töchter |
Plur | [en] [de] [hi] |
their son ihr Sohn unakā bēṭā |
their daughter ihre Tochter unakī bēṭī |
their house ihr Haus |
their sons ihre Söhne unakē bēṭē |
their daughters ihre Töchter unakī bēṭiyām̐ |
If a feature is (can be) layered in a language, the name of the feature must
indicate the layer. An additional identifier in square brackets is used to
distinguish layers, e.g. Gender[psor]
for the possessor’s gender.
We recommend that the layer identifiers consist of lowercase English letters
[a-z]
and/or digits [0-9]
.
The layers, their meaning and their
identifiers must be defined in a language-specific extension to this
documentation. For each layered feature, one layer may be defined as default
and the corresponding features then appear without identifier,
e.g. Gender=Masc|Gender[psor]=Fem
.
In the following, we list some examples of layered features attested in existing corpora. These may be used as inspiration or they may be used as-is in treebanks for which they are found appropriate. Note that even if a treebank uses a layered feature from this section, it should still be described in the language-specific documentation.
Gender[psor]
Possessive adjectives and pronouns may have two different genders: that of the possessed object (gender agreement with modified noun) and that of the possessor (lexical feature, inherent gender).
The Gender[psor]
feature captures the possessor’s gender.
In the Czech examples below, the masculine Gender[psor] implies using one of the suffixes -ův, -ova, -ovo, and the feminine Gender[psor] implies using one of -in, -ina, -ino.
Masc: masculine possessor
Examples:
[cs]
otcův syn (father’s son; Gender=Masc|Gender[psor]=Masc
);
otcova dcera (father’s daughter; Gender=Fem|Gender[psor]=Masc
);
otcovo dítě (father’s child; Gender=Neut|Gender[psor]=Masc
).
Fem: feminine possessor
Examples:
[cs]
matčin syn (mother’s son; Gender=Masc|Gender[psor]=Fem
);
matčina dcera (mother’s daughter; Gender=Fem|Gender[psor]=Fem
);
matčino dítě (mother’s child; Gender=Neut|Gender[psor]=Fem
).
In other languages (Hebrew, Arabic), the possessor’s gender and number are agreement rather than lexical features:
Examples: [he] HKPH FL HARC (perimeter of country).
Features of the two nouns are as follows:
perimeter.Gender=Masc|Gender[psor]=Fem|Number=Sing|Number[psor]=Sing
country.Definite=Def|Gender=Fem|Number=Sing
.
The [psor] features of perimeter are dictated by agreement with the possessor, country.
(This is a partial description of this example. HKPH has many morphological analyses, some of them are masculine single-layered, some of them are feminine single-layered. You can only find the right morphosyntactic analysis if you detect the two layers of agreement features, and can identify this specific agreement pattern.)
Number[psor]
Possessives
may have two different numbers: that of the possessed object (number
agreement with modified noun) and that of the possessor. The
Number[psor]
feature captures the possessor’s number.
Sing: singular possessor
Examples:
[en]
my, his, her, its;
[cs]
můj pes
(my dog; Number=Sing|Number[psor]=Sing
);
mí psi
(my dogs; Number=Plur|Number[psor]=Sing
).
Plur: plural possessor
Examples:
[en]
our, their;
[cs]
náš pes
(our dog; Number=Sing|Number[psor]=Plur
);
naši psi
(our dogs; Number=Plur|Number[psor]=Plur
).
Person[psor]
The possessor’s person is marked e.g. on Hungarian nouns. These noun forms would be translated to English as possessive pronoun + noun.
Note that it is reasonable to make this a layered feature even though
the default Person is normally not
marked on nouns. In relation to verbs (which may have to mark person
agreement with nouns), a noun is almost always in the third person.
So even if this default person is not explicitly marked morphologically,
and probably the default Person
does not appear among features of
the noun, we should not use the default layer of persons to mark the
possessor. If we abused the default layer, the annotation would no longer
be parallel to personal pronouns that could be substituted for the noun.
On the other hand, we probably do not want a separate [psor]
layer
for the person of possessive determiners / pronouns.
They modify a noun, not a verb. Arguably they have only one Person
feature and it is lexical (while for the Hungarian nouns,
Person[psor]
is inflectional).
They usually modify nouns, not verbs, and agreement with verbs does
not play any role.
Moreover, in some languages possessive pronouns are actually identical
to personal pronouns in the genitive case
and it is logical that they have the same Person
as in the nominative.
1: first person possessor
Examples: [hu] kutya = dog; kutyám = my dog; kutyánk = our dog.
2: second person possessor
Examples: [hu] kutya = dog; kutyád = your.Sing dog; kutyátok = your.Plur dog.
3: third person possessor
Examples: [hu] kutya = dog; kutyája = his/her/its dog; kutyájuk = their dog.
János csontja
lit. John his-bone
John’s bone
János csontjai
lit. John his-bones
John’s bones
Péternek sok pénze van.
lit. to-Peter much his-money there-is
Peter has a lot of money.
Number[psee]
This feature seems to be very specific to Hungarian. It denotes the possessee’s (possessed, owned noun phrase’s) number. Hungarian has three types of number in the nominal inflection:
- The number of the noun (inflectional, non-agreement).
- The number of owners that own the noun (inflectional, agreement with possessor that may or may not be pronounced).
- The number of the context-given referent, which is some possession of the noun, i.e. belongs to the noun (anaphoric possessive; in a sense, this is an agreement feature, but the head noun is not pronounced in the sentence).
Examples from the Multext-East Hungarian lexicon:
- könnyedén (SSS)
- könny = a tear (singular)
- könnyed = your tear (singular owner)
- könnyedé = (possession) of your tear (singular possession)
- könnyedén = (on the possession) of your tear (superessive case)
- ellenfeleié (PSS)
- ellenfél = an opponent (singular)
- ellenfele = his/her/its opponent (singular owner)
- ellenfelei = his/her/its opponents (core plural, singular owner)
- ellenfeleié = (possession) of his/her/its opponents (singular possession)
- életeké (SPS)
- él = point (singular)
- élek = points (plural)
- élén = his/her/its point (singular owner)
- élünk = our point (plural owner)
- életeké = (possession) of our point (singular possession)
- tárgyalópartnereinkét (PPS)
- tárgyalópartner = negotiator (singular)
- tárgyalópartnerei = his/her/its negotiators (plural, singular owner)
- tárgyalópartnereinkét = (possession) of our negotiators (plural, plural owner, singular possession, accusative case)
Words marked for plural possessions are very rare, though. Note that in the following example from Multext-East, Columbus is marked for plural possession, but not for his own owner.
- Kolumbuszéinál
- Kolumbusz = Columbus (singular)
- Kolumbuszéi = (possessions) of Columbus (plural possession)
- Kolumbuszéinál = (at the possessions) of Columbus (adessive case)
See also Éva Dékány (2014): The syntax of anaphoric possessives in Hungarian: In anaphoric possessives the possessed noun, the head of the whole nominal phrase, is not pronounced, and its reference has to be recovered from the context. The possessor in Hungarian anaphoric possessives has to bear the -é suffix.
Since Number[psee]=Plur
is extremely rare, this feature is not so important
for distinguishing singular and plural possessions. However, the mere presence
of Number[psee]=Sing
informs that there is the -é suffix and thus that
there is an unpronounced possession.
Layered verb agreement in Basque
Verbs in many Indo-European languages must agree in person and number with their subject. This is what typically u-feat/Person and u-feat/Number of verbs denote.
Some verbs in Basque must agree in person and number with up to three arguments: the absolutive argument (subject of intransitive verbs and object of transitive verbs), the ergative argument (subject of transitive verbs) and the dative argument (indirect object).
We could make the absolutive agreement the default, thus using Person
and Number
without layer identifiers.
If there is also one of the other two arguments, we will have
Person[erg]
, Number[erg]
and Person[dat]
, Number[dat]
, respectively.
Example: nahi dizkiegu, lemma = nahi_izan,
feats = Number=Plur|Number[dat]=Plur|Number[erg]=Plur|Person=3|Person[dat]=3|Person[erg]=1
(we want them to them).