home edit page issue tracker

This page pertains to UD version 2.

Can Determiners Have Children in the Tree Structure?

This page discusses nodes that are attached to their parents via the det relation. When we say determiner in this text, we are referring to the relation and not to the UPOS category DET. Of course, there is a strong correlation between the two but it is not absolute. For example, a DET could be promoted when its noun is elided, then its incoming relation would be something else than det and the rules discussed here would not apply.

Determiners are generally considered function words in UD. As such, they are typically leaf nodes, that is, they do not have children. However, this rule is not strict and the guidelines list some exceptions for various function words including determiners.

For several years, the UD validator did not enforce this rule for determiners while it was doing so for other function words. The test was finally implemented in September 2024 under the label leaf-det-clf, later split to two tests, leaf-det and leaf-clf; since this page is about determiners and not about classifiers, we are interested in leaf-det. Like other similar tests for function words, it allows determiners to have children if the (universal part of the) relation between the determiner and the child is in a predefined subset:

Light Adverbials

The guidelines also acknowledge that “certain types of function words can take a restricted class of modifiers, mainly light adverbials (including negation).” They explicitly give one example that involves a determiner: not every linguist. Ideally the validator should have the list of permitted modifiers (light adverbials) for each language; as such lists are currently not available, the validator has additional deprel-level exceptions, although they potentially open the door for other adverbials that should not be allowed:

Problematic Constructions

After the validation test was introduced, it identified problems in over 200 UD treebanks. For some of them, their maintainers suggested that the guidelines should be extended to allow additional specific constructions. The full discussion is in Issue #1059; as it is extremely long and messy, we try to summarize the interesting points here.

Compound Determiners

(Laura Rituma)

In Latvian, we have several expressions considered as compound pronouns in Latvian traditional grammar which consist of one particle and one pronoun. For example, kaut kāds where kaut is a particle and kāds is a pronoun (this expression roughly means ‘some kind of’). Currently, we annotate the particle as discourse which is dependent of pronoun, and pronoun occasionally becomes det if the expression describes a noun. This leads to validation error.

The particles in these expressions usually are kaut, diez, diezin, nez, nezin, and they all have very fuzzy, hard to pin down semantics so we feel uncomfortable annotating them as adverbs.

We would like to annotate these expressions as compound (instead of fixed) because the pronoun is the second element in the phrase and we feel that it is the head of the phrase because the pronoun inflects together with a noun and bears the most of semantic meaning of the expression.

It seems that the test should not report cases where a determiner has a compound child. After all, compound is just a signal that two nodes together act almost like one word, but in contrast to fixed, one of them can be considered the head.

Classifiers under Determiners

According to the guidelines on classifiers, in the demonstrative + classifier construction in Chinese, the classifier forms a constituent with the demonstrative and must be attached as its child. Therefore, clf must be added as an exception to the validation rule. This is also an exception to the general rule that function words are not chained in UD trees.

Determiners under Determiners

As the guidelines say, multiple determiners are always attached directly to the head noun:

All/DET these/DET three/NUM books/NOUN . det(books, All) det(books, these) nummod(books, three)

However, some languages have constructions that look quite different from the English example above.

(Daniel Swanson)

In Hebrew (both Ancient and Modern), demonstrative pronouns have their own determiners, as in “the men the these” = “these men”. It is also parallel to how adjectival modification works in Modern Hebrew. Maybe determiners under demonstratives could be allowed in some languages but not the others?

# x- so the RTL text doesn't make this unreadable
1	ה	x-ה	DET	art	PronType=Art	2	det	_	Gloss=the|Ref=GEN_19.8
2	אֲנָשִׁ֤ים	x-אישׁ	NOUN	subs	Gender=Masc|Number=Plur	0	root	_	Gloss=man|Ref=GEN_19.8
3-4	הָאֵל֙	_	_	_	_	_	_	_	_
3	הָ	x-ה	DET	art	PronType=Art	4	det	_	Gloss=the|Ref=GEN_19.8
4	אֵל֙	x-אל	PRON	prde	Number=Plur|PronType=Dem	2	det	_	Gloss=these|Ref=GEN_19.8

Case under Determiners

(Petr Kocharov)

Somewhat similar to demonstratives with articles in Hebrew, Classical Armenian has repeated case markers on noun and its demonstrative. Prepositions and articles can be repeated with modifiers, including demonstrative pronominal adjectives, within NP, cf.

i kʻarancʻ y ayscʻanē
det(kʻarancʻ, ayscʻanē)
case(kʻarancʻ, i)
case(ayscʻanē, y)

Attaching the case marker of the demonstrative to the head noun, which has its own copy of the case marker, would be odd.

Unfortunately, if case under det is allowed in all languages and not just under Classical Armenian demonstratives, it will open the door for cases that are clear errors. For example, the following is in the current Chinese data (brought up by Koichi Yasuoka).

# text = 她的這本書
1	她	她	PRON	PRP	Person=3	5	det	_	SpaceAfter=No
2	的	的	PART	DEC	Case=Gen	1	case	_	SpaceAfter=No
3	這	這	DET	DT	_	5	det	_	SpaceAfter=No
4	本	本	NOUN	NNB	_	3	clf	_	SpaceAfter=No
5	書	書	NOUN	NN	_	0	root	_	SpaceAfter=No

DZ: 她的 tā de “of her” is a prototypical example of nmod. So changing the current det(書, 她) to nmod(書, 她) will solve it. (On the other hand, the classifier under the second determiner in this example is correct according to the guidelines.)

Parentheticals

Laura Rituma added a Latvian example where parataxis may be needed under a determiner.

tādā godīgā iestādē ieperinājušies daži (tikai daži!) zagļi “a few (only a few!) thieves have nested in such an honest institution”

tādā godīgā iestādē ieperinājušies daži ( tikai daži ! ) zagļi
det(zagļi, daži-5)
parataxis(daži-5, daži-8)

Dan Zeman thinks it may warrant a clarification/amendment of the guidelines, allowing parenthetical parataxis of determiners similar to coordination. But Joakim Nivre thinks that even here we see two determiners that should be attached as siblings to the head noun.

Sylvain Kahane also had an example of parataxis but that one turned out to be unproblematic (see below) because the parent node, although tagged DET, is not annotated syntactically as a det but reparandum.

Reduplication (flat)

(Flavio Cecchini)

  1. The already mentioned reduplication, which is treated through flat:redup in Latin treebanks. One example is quot quot from quot: while the latter means ‘as many as’, the reduplication has a distributive sense as in ‘for each possible one…’ (this expression is sometimes even univerbated). I think to annotate them separately, each depending on the head, is not the right way to deal with them: here we do not have two or more different terms, but really the same one “clonating” itself. On the other hand, flat is really the closest relation we have to fixed, which would cause no problem, but is not a correct choice (well, in my opinion it is never the correct choice)
    • Problem: horizontal relation

Can we deactivate the validation rule if the child of det is a flat relation?

Dan: Why is fixed not a good choice? Flavio: Because it is productive and not idiosyncratic.

Problems with Referentiality

A large part of the discussion slipped to the related problem of deciding between det and nmod (or their :poss subtypes). Joakim believes that the guidelines imply, despite not saying it explicitly, that if the word is referential, it should be attached as nmod rather than det. It would be the case of all words referring to possessors, regardless whether they are tagged as PRON, DET, NOUN or ADJ. But other people (including Dan) do not understand the guidelines this way.

The referentiality criterion would nevertheless have the advantage that some problems with the leaf-det test would disappear. The problems belonging to this class are listed in this section. The occasional need to attach an apposition or a relative clause to a determiner are caused by the fact that the determiner is referential (because it is a possessive). If the referentiality criterion gains support and is approved via an amendment of the guidelines, many treebanks will require large changes. But in the current context, the leaf-det test should be probably relaxed for the special cases below.

Relative Clause Modifying a Referent Hidden in a Possessive Determiner

(Flavio Cecchini)

Latin: The phrase nostra qui remansissemus caede ‘the murder of us who are left (behind)’, but more literally ‘our who are left murder’, since nostra is the inflected possessive determiner for the 1st person plural. What happens here is that the possessive adds a nominal person, as it were, and this person is another referent beyond the noun caede ‘murder’ in this phrase; as such, the relative can target it (or at least, Cicero pleases himself in doing so). We could not really justify an analysis where we shift the relative under the head noun, since the murder is not one of its arguments. * Problem: the relative clause dependent of the determiner cannot be traced back to the referent of its head

nostra qui remansissemus caede
det(caede, nostra)
acl:relcl(nostra, remansissemus)
nsubj(remansissemus, qui)

Can we deactivate this validation rule if the head element has the feature Person, at least for acl:relcl?

Apposition

(Jack Rueter)

A possessive determiner may have an appositional child: His, Fred’s, friends come from all over. This is in fact not a problem in English where personal possessives are pronouns rather than determiners (and they are attached as [nmod:poss]) but in other languages it is a problem. For example in Erzya:

Конат-конат сонзэ (Степан Иваныч) ладсо сырелгадсть… / Konat-konat sonzè (Stepan Ivanyč) ladso syrelgadstʹ… “such-such.PL his/her (Stepan Ivanych) way.INE become.older.3PL”

Конат-конат сонзэ ( Степан Иваныч ) ладсо сырелгадсть
obl(сырелгадсть, ладсо)
det(ладсо, сонзэ)
appos(сонзэ, Степан)

Koichi Yasuoka provided a Chinese example of apposition.

Nominal Possessive Modifier of a Determiner

(Janine Siewert)

Low Saxon: It is explained in Section 5.1 here: https://aclanthology.org/2024.lrec-main.1388.pdf The gloss and translation of the sentence can be found in Section 4.3.

Ik sto in der Gemoene iarem Denste “I stand in the service of the parish.” (lit. “I stand in the.DAT parish.DAT her.DAT service.DAT”)

Ik sto in der Gemoene iarem Denste
nsubj(sto, Ik)
obl(sto, Denste)
case(Denste, in)
det(Denste, iarem)
nmod:poss(iarem, Gemoene)
det(Gemoene, der)

Attaching the possessor in dative case to the possessee instead of the determiner does not represent the way this construction works because 1) the dative possessor cannot be attached to the possessee without the determiner and 2) the possessee can be dropped while the determiner cannot. E.g., in the example in my paper, “In der Gemoene iarem.” (literally “in the parish hers”) is a valid answer to a specification question in whose service the person stands. (A note to German speakers: Masculine and neuter nouns show that this is indeed a dative, not a genitive.) The alternative to change the determiners’ tags to PRON in Low Saxon would go against UD’s own definition of determiners.

DZ: Is iarem coreferential with der Gemoene? Attaching Gemoene as nmod:poss of iarem is odd because it suggests that Gemoene is the possessor of her, not of the service. The cited paper also says: “Among the UD languages, we have found comparable constructions in Afrikaans, Frisian Dutch, and Norwegian, but the annotation has been inconsistent across these languages.” The annotation indeed should be made consistent, but maybe one of those languages uses an analysis that works well under the UD guidelines?

Subsequent comment by Flavio and Dan.

Quantifiers

Pronominal quantifiers, as opposed to definite cardinal numerals, are treated as determiners following the UD guidelines (see DET, NUM; the distinction is currently not mentioned directly in the guidelines for det and nummod, but these relations are normally used to connect the quantifier with the counted noun, so the UPOS distinction projects to the relation distinction straightforwardly). Nevertheless, pronominal quantifiers can be modified to further specify the quantity or to compare it with some other quantity, as in these Czech examples:

třikrát tolik dětí než X \n three.times so.many children than X
advmod(tolik, třikrát)
advmod(so.many, three.times)
det:numgov(dětí, tolik)
det:numgov(children, so.many)
nmod(tolik, X-5)
nmod(so.many, X-11)
case(X-5, než)
case(X-11, than)
víc rozdílů než společných prvků \n more differences than common elements
det:numgov(rozdílů, víc)
det:numgov(differences, more)
nmod(víc, prvků)
nmod(more, elements)
case(prvků, než)
case(elements, than)
amod(prvků, společných)
amod(elements, common)
o 600000 méně lidí \n by 600,000 fewer people
case(600000, o)
case(600,000, by)
nmod(méně, 600000)
nmod(fewer, 600,000)
det:numgov(lidí, méně)
det:numgov(people, fewer)

The validation exception that could capture these cases looks at the features of the determiner. If there is NumType or Degree, nmod children are allowed. Alternatively, we could require that the relation between the quantifier and its modifier is obl because the quantifier is not really a nominal. Then we would not need a new exception because obl is already allowed for other reasons.

Problematic Constructions that Do Not Need an Exception

Adverbial Clauses

(Nathan Schneider)

In English, such is a demonstrative determiner and it may license an advcl, as in these results. The guidelines on sufficiency and excess for so and similar say the advcl should attach to the adjective or adverb, not the noun in a case like sufficient flour. Then in such a high price that nobody could afford it, we may want to attach the advcl dependent to such.

Ideally, the validator should allow advcl specifically in English and only if the head is such. If there are similar constructions in other languages, they should be also registered specifically for those languages and not en bloc for the whole UD.

(Joakim Nivre) You have the choice of treating such as amod, in which case it is unproblematic to attach an advcl to it. If you treat such as det, you instead have to attach the clause to its head (that is, to the whole phrase). This is similar to how we treat some comparative constructions. (Dan Zeman) If such is amod, then it should probably also have the ADJ UPOS tag. Although it looks like the current validator will not complain if it sees a DET attached as amod (it definitely does complain if it sees an ADJ attached as det).

Vor allem

(Leonie Weissweiler)

German vor allem and unter anderem – resolved in a separate issue?

Parentheticals in Spoken Data

(Sylvain Kahane)

DZ: Is attaching the parenthetical to the first determiner better than attaching it to the noun (kiosk)? Apart from the non-projectivity – I realize that there is similarity between this and the discourse point above.

SK: Yes it is similar to the discourse marker case and I propose the same solution. Moreover, in this case, “a, I don’t know how to call that” forms a kind of semantic and prosodic unit, which is not the case of “I don’t know how to call that, a kiosk”. I really want to attach the parenthesis to the first determiner.

DZ: Now I realize that here, too, we shouldn’t have a problem because the first determiner should be attached to the kiosk as a reparandum:

a , I do n't know how to call that , a kiosk
reparandum(kiosk, a-1)
det(kiosk, a-12)
parataxis(a-1, know)

SK: My mistake! Your validator doesn’t forbid discourse, parataxis, or orphan depending on a DET which is reparandum. And it is good like this.

(DZ: Note however that the question of parentheticals depending on determiners is broader and one of the examples mentioned earlier shows that we may want to allow them anyway.)

Fillers in Spoken Data

(Sylvain Kahane)

For spoken data, we need the following to be added to the validator:

Dan Zeman: Does the interjection have to be attached to one of the determiners? The discourse page says that they are attached to the most relevant nearby unit, which is not very helpful, but I thought they would be attached at clause level (yes, it would be non-projective in this case).

Sylvain: Yes, I really want to analyse the discourse marker as the marker of the reparandum. If you want to keep a constrained rule, we can allow discourse only when the determiner is a reparandum. Clearly the determiners around the discourse marker are “the most relevant nearby units”.

Dan: I am not sure I understand. If the discourse child is attached to a parent that itself is a reparandum, we do not have a problem at all (regardless whether the UPOS tag of the parent is DET). We would have a problem only if the determiner parent were attached as det.

SK: My mistake! Your validator doesn’t forbid discourse, parataxis, or orphan depending on a DET which is reparandum. And it is good like this.

False Starts with the dep Relation

(Sylvain Kahane)

dep for false starts such as “the last, the last day”: here “the last” forms a phrase the head of which is missing and we decided to have dep(the, last). I am not against another solution, as long as “the last” is still a phrase.

DZ: I don’t have a strong position for the first two points above but here I do. I really don’t think that dep deserves any dedicated rule anywhere in UD, it is a last resort in datasets but its usage should be minimized. I don’t understand what makes this case so different from cases where we use reparandum? I always thought we would use it for false starts as well; the only thing that makes them different from true repairs is that the reparandum is identical with the repair, but I still see a strong analogy, when it is uttered for the second time, the first attempt is canceled in a sense. Moreover, here I would not attach the reparandum to the article but to the head of the last day, which would make it unproblematic w.r.t. the rule that determiners do not have children.

SK: The problem is not with the last day, the repandum starts from day. The problem is the analysis of the last, the false start itself. I don’t like the idea to analyze it as a correct phrase, for instance with det(last,the). I want to keep the information that it is a false start and not a complete phrase. It is why we chose dep(the, last), but I am ok to use another relation. I give you another example of false start, which I would like to analyze similarly:

les gens qu’on, qu’on voit pour la première fois ‘people who we, who we see for the first time’ (In French the relative pronoun cannot be omitted)

Here we have two pronouns qu’ and on, which form the false start together, and I don’t see what could be the link between them apart from dep. Maybe a dedicated relation such as flat:disfluency?

DZ: It is not a complete phrase but I would still find it natural to apply the UD rules for ellipsis (=> for incomplete phrases), promote last and draw the relation det(last, the). I think the information that it is a false start is already encoded in the incoming reparandum relation (which could be further subtyped to reparandum:falsestart if it is needed).

the last , the last day
det(last-2, the-1)
punct(last-2, ,)
reparandum(day, last-2)
det(day, the-4)
amod(day, last-5)

DZ: As for the French example, I understand your concern, although I would claim that the UD ellipsis policy provides a possible solution here, too: orphan(on, qu').

les gens qu' on , qu' on voit pour la première fois
orphan(on-4, qu'-3)
punct(on-4, ,)
reparandum(voit, on-4)
obj(voit, qu'-6)
nsubj(voit, on-7)

SK: Maybe a dedicated relation such as flat:disfluency?

DZ: Right now I think I prefer the ellipsis solution sketched above, but flat would probably still be better than dep. None of the two solutions should trigger the leaf-det validation test.

SK: My mistake! Your validator doesn’t forbid discourse, parataxis, or orphan depending on a DET which is reparandum. And it is good like this.

Semantic Coordination, Syntactic Flat?

(Koichi Yasuoka)

In Classical Chinese 彼此兵 “those and these soldiers” is invalidated by this rule. The English translation has coordinate determiners but there is no coordinating conjunction in the original and 彼此 “that this” are connected via flat. Then “that” is attached as det to “soldier”.

Flavio (and Dan): Here the simplest solution would be to use conj instead of flat. (But in the end, flat may be allowed because of other things, namely reduplication of function words.