Enhanced Dependencies
We always intended the Universal Dependencies representation to be used in shallow natural language understanding tasks such as
relation extraction or biomedical event extraction. For such tasks, one is typically interested in the relation between certain
entities, e.g., the relation between two persons or whether one protein interacts with another. UD is particularly well suited
for such tasks as UD trees contain many direct dependencies between content words and many of the dependency labels provide a
lot of information about the type of relation between two content words. However, for some constructions, the dependency path
between two content words of interest can be very long in a UD tree, which complicates determining how the content words are
related. Further, some dependency types such as obl
or nmod
are used for many different types of
arguments and modifiers, and therefore they are not very informative on their own. For these reasons, we also provide guidelines
for an enhanced representation, which makes some of the implicit relations between words more explicit, and augments some of
the dependency labels to facilitate the disambiguation of types of arguments and modifiers.
Enhanced UD graphs may contain some or all of the following enhancements, which are described in the sections below. If a corpus does not annotate any of the enhancements defined in the guidelines, it should always have the underscore character in the DEPS column. That is, the enhanced graph should not be just an exact copy of the basic tree for all sentences in the corpus. Otherwise it creates the impression that the user can expect some enhancements while there are actually none.
- Empty (null) nodes for elided predicates
- Propagation of incoming dependencies to conjuncts
- Propagation of outgoing dependencies from conjuncts
- Additional subject relations for control and raising constructions
- Coreference in relative clause constructions
- Modifier labels that contain the preposition, other case marker or conjunction
Note that the enhanced graph is not necessarily a supergraph of the basic tree, i.e., the graph is not required to contain all the basic dependency relations. For this reason, all relations of the enhanced graph (also the ones that are present in the basic UD tree) have to be included in the DEPS column of a CoNLL-U file. See the specificiation of the CoNLL-U file format for details.
Furthermore, the dependency relation labels in the enhanced graph in DEPS may contain certain extensions that are not permitted
in the basic relation type in the DEPREL column. The regular expression restricting relation labels in DEPREL is pretty simple;
the label can contain only lowercase English letters and at most one colon, which separates the universal and the language-specific
part of the label: ^[a-z]+(:[a-z]+)?$
. In contrast, the relation label in DEPS may contain up to three colons, separating up to
four sections. One of the sections (never the first one) may also contain lowercase Unicode letters and the underscore character:
^[a-z]+(:[a-z]+)?(:[\p{Ll}\p{Lm}\p{Lo}\p{M}]+(_[\p{Ll}\p{Lm}\p{Lo}\p{M}]+)*)?(:[a-z]+)?$
.
Only the first section, the universal relation, is mandatory. The other sections are optional but if they appear, they must appear
in the order described below. We provide a more detailed explanation of the extra sections later on this page; here is a summary:
- Universal dependency relation. In addition to the 37 relations defined in the basic representation, the relation can also be ref.
- Documented relation subtype (either language-specific or more general) from the basic representation.
- The string xsubj, denoting external subject relations of xcomp predicates. This extension is used only with nsubj, csubj, and their subtypes such as nsubj:pass. It does not combine with the other extensions described below because they do not apply to subjects.
- Case and similar information –
adposition or conjunction that occurs as a
case
,mark
orcc
dependent of the node whose relation to its parent is being enhanced. Note that this is the only part where non-ASCII letters are permitted within the enhanced relation label. The word should be normalized (lowercased, no typos), i.e., in general we take its lemma. However, if the case/mark dependent is a fixed multi-word expression, the lemma of the expression is not necessarily composed of lemmas of the individual member words. For instance, the string representing the English expression “As Opposed To” isas_opposed_to
. That is, the casing is normalized from “As” to “as” etc., but “opposed” is not replaced by its lemma “oppose” because the expression is fixed. Similarly, grammaticalized deverbal connectives such as “regarding” may in some languages (if required by the language-specific guidelines) still be tagged VERB, despite being attached as case, and their lemma will thus be verbal (“regard”); nevertheless, the corresponding deprel extension should be the grammaticalized form, i.e., “regarding”. Language-specific guidelines may also specify that certain synonyms (e.g., “toward” and “towards”) be mapped on the same enhanced label, despite having different lemmas. We use the underscore character (“_”) to connect member words. The same approach can also be taken when a node has multiple case markers that are not annotated as a fixed expression, e.g.,out_of
for “out of business”. - Case information –
morphological case of the node whose relation to its parent is being enhanced. Value corresponds to the value of
the Case feature but it is lowercased (e.g.,
gen
instead ofGen
). Unlike in morphological features, multivalues with comma (Case=Acc,Dat
) are not allowed. Case information in enhanced relations must be fully disambiguated.
Ellipsis
(See also the guidelines on ellipsis.)
In the enhanced representation, we add special empty (null) nodes in clauses in which a predicate is elided. (Although the node is termed ‘empty’ in the CoNLL-U format specification, and although it does not correspond to an overt surface token, its FORM, LEMMA, UPOS, XPOS and FEATS may be optionally filled with the assumed values; here they can be copied from the overt occurrence of the predicate.)
# visual-style 5 6 orphan color:red
# visual-style 2 5 conj color:red
# visual-style 5 4 cc color:red
1 I _ _ _ _ 2 nsubj _ _
2 like _ _ _ _ 0 root _ _
3 tea _ _ _ _ 2 obj _ _
4 and _ _ _ _ 5 cc _ _
5 you _ _ _ _ 2 conj _ _
6 coffee _ _ _ _ 5 orphan _ _
7 . _ _ _ _ 2 punct _ _
|
# visual-style 6 7 obj color:blue
# visual-style 6 5 nsubj color:blue
# visual-style 2 6 conj color:blue
# visual-style 6 4 cc color:blue
1 I _ _ _ _ 2 nsubj _ _
2 like _ _ _ _ 0 root _ _
3 tea _ _ _ _ 2 obj _ _
4 and _ _ _ _ 6 cc _ _
5 you _ _ _ _ 6 nsubj _ _
6 E5.1 _ _ _ _ 2 conj _ _
7 coffee _ _ _ _ 6 obj _ _
8 . _ _ _ _ 2 punct _ _
|
# visual-style 8 10 orphan color:red
# visual-style 2 8 conj color:red
# visual-style 8 7 cc color:red
1 Mary _ _ _ _ 2 nsubj _ _
2 wants _ _ _ _ 0 root _ _
3 to _ _ _ _ 4 mark _ _
4 buy _ _ _ _ 2 xcomp _ _
5 a _ _ _ _ 6 det _ _
6 book _ _ _ _ 4 obj _ _
7 and _ _ _ _ 8 cc _ _
8 Jenny _ _ _ _ 2 conj _ _
9 a _ _ _ _ 10 det _ _
10 CD _ _ _ _ 8 orphan _ _
11 . _ _ _ _ 2 punct _ _
|
# visual-style 9 8 nsubj color:blue
# visual-style 10 12 obj color:blue
# visual-style 9 10 xcomp color:blue
# visual-style 9 7 cc color:blue
# visual-style 4 1 nsubj color:blue
# visual-style 10 8 nsubj color:blue
1 Mary _ _ _ _ 2 nsubj 4:nsubj _
2 wants _ _ _ _ 0 root _ _
3 to _ _ _ _ 4 mark _ _
4 buy _ _ _ _ 2 xcomp _ _
5 a _ _ _ _ 6 det _ _
6 book _ _ _ _ 4 obj _ _
7 and _ _ _ _ 9 cc _ _
8 Jenny _ _ _ _ 9 nsubj 10:nsubj _
9 E8.1 _ _ _ _ 2 conj _ _
10 E8.2 _ _ _ _ 9 xcomp _ _
11 a _ _ _ _ 12 det _ _
12 CD _ _ _ _ 10 obj _ _
13 . _ _ _ _ 2 punct _ _
|
Note that this is a case in which the enhanced UD graph is not a supergraph of the basic tree as the basic tree contains orphan
relations, which are not present in the enhanced UD graph.
Propagation of incoming dependencies to conjuncts
In the basic representation, the governor and dependents of a conjoined phrase are all attached to the first conjunct. This often leads to very long dependency paths between content words. The enhanced representation therefore also contains dependencies between the other conjuncts and the governor and dependents of the phrase.
Conjoined subjects and objects
When the subject is a conjoined noun phrase, each of the conjuncts is attached to the predicate.
1 Paul _ _ _ _ 5 nsubj _ _
2 and _ _ _ _ 3 cc _ _
3 Mary _ _ _ _ 1 conj _ _
4 are _ _ _ _ 5 aux _ _
5 running _ _ _ _ 0 root _ _
6 . _ _ _ _ 5 punct _ _
|
# visual-style 5 3 nsubj color:blue
1 Paul _ _ _ _ 5 nsubj _ _
2 and _ _ _ _ 3 cc _ _
3 Mary _ _ _ _ 1 conj 5:nsubj _
4 are _ _ _ _ 5 aux _ _
5 running _ _ _ _ 0 root _ _
6 . _ _ _ _ 5 punct _ _
|
The same is true for conjoined objects.
1 Paul _ _ _ _ 2 nsubj _ _
2 bought _ _ _ _ 0 root _ _
3 apples _ _ _ _ 2 obj _ _
4 and _ _ _ _ 5 cc _ _
5 oranges _ _ _ _ 3 conj _ _
6 . _ _ _ _ 2 punct _ _
|
# visual-style 2 5 obj color:blue
1 Paul _ _ _ _ 2 nsubj _ _
2 bought _ _ _ _ 0 root _ _
3 apples _ _ _ _ 2 obj _ _
4 and _ _ _ _ 5 cc _ _
5 oranges _ _ _ _ 3 conj 2:obj _
6 . _ _ _ _ 2 punct _ _
|
This leads to slightly strange dependencies in the case of collective subjects or objects:
1 Paul _ _ _ _ 5 nsubj _ _
2 and _ _ _ _ 3 cc _ _
3 Mary _ _ _ _ 1 conj _ _
4 are _ _ _ _ 5 aux _ _
5 meeting _ _ _ _ 0 root _ _
6 . _ _ _ _ 5 punct _ _
|
# visual-style 5 3 nsubj color:blue
1 Paul _ _ _ _ 5 nsubj _ _
2 and _ _ _ _ 3 cc _ _
3 Mary _ _ _ _ 1 conj 5:nsubj _
4 are _ _ _ _ 5 aux _ _
5 meeting _ _ _ _ 0 root _ _
6 . _ _ _ _ 5 punct _ _
|
1 Mary _ _ _ _ 3 nsubj _ _
2 is _ _ _ _ 3 aux _ _
3 eating _ _ _ _ 0 root _ _
4 mac _ _ _ _ 3 obj _ _
5 and _ _ _ _ 6 cc _ _
6 cheese _ _ _ _ 4 conj _ _
7 . _ _ _ _ 3 punct _ _
|
# visual-style 3 6 obj color:blue
1 Mary _ _ _ _ 3 nsubj _ _
2 is _ _ _ _ 3 aux _ _
3 eating _ _ _ _ 0 root _ _
4 mac _ _ _ _ 3 obj _ _
5 and _ _ _ _ 6 cc _ _
6 cheese _ _ _ _ 4 conj 3:obj _
7 . _ _ _ _ 3 punct _ _
|
However, as the distinction between distributive and collective readings is often context-dependent, we take the simplest approach and always attach all conjuncts to the predicate.
When the subject is attached to a control or raising predicate, there is a dependency between the matrix verb and each conjunct and between the embedded verb and each conjunct.
1 Mary _ _ _ _ 4 nsubj _ _
2 and _ _ _ _ 3 cc _ _
3 John _ _ _ _ 1 conj _ _
4 wanted _ _ _ _ 0 root _ _
5 to _ _ _ _ 6 mark _ _
6 buy _ _ _ _ 4 xcomp _ _
7 a _ _ _ _ 8 det _ _
8 hat _ _ _ _ 6 obj _ _
9 . _ _ _ _ 4 punct _ _
|
# visual-style 4 3 nsubj color:blue
# visual-style 6 1 nsubj color:blue
# visual-style 6 3 nsubj color:blue
1 Mary _ _ _ _ 4 nsubj 6:nsubj _
2 and _ _ _ _ 3 cc _ _
3 John _ _ _ _ 1 conj 4:nsubj|6:nsubj _
4 wanted _ _ _ _ 0 root _ _
5 to _ _ _ _ 6 mark _ _
6 buy _ _ _ _ 4 xcomp _ _
7 a _ _ _ _ 8 det _ _
8 hat _ _ _ _ 6 obj _ _
9 . _ _ _ _ 4 punct _ _
|
Conjoined modifiers
Each conjunct in a conjoined modifier phrase gets attached to the governor of the modifier phrase. For example, the following phrase contains a conjoined adjectival phrase that modifies a noun. In the enhanced representation, there is an additional amod
relation between the noun river and the second conjunct wide.
1 a _ _ _ _ 5 det _ _
2 long _ _ _ _ 5 amod _ _
3 and _ _ _ _ 4 cc _ _
4 wide _ _ _ _ 2 conj _ _
5 river _ _ _ _ 0 root _ _
|
# visual-style 5 4 amod color:blue
1 a _ _ _ _ 5 det _ _
2 long _ _ _ _ 5 amod _ _
3 and _ _ _ _ 4 cc _ _
4 wide _ _ _ _ 2 conj 5:amod _
5 river _ _ _ _ 0 root _ _
|
Propagation of outgoing dependencies from conjuncts
In the basic representation, the governor and dependents of a conjoined phrase are all attached to the first conjunct. This often leads to very long dependency paths between content words. The enhanced representation therefore also contains dependencies between the other conjuncts and the governor and dependents of the phrase.
Conjoined verbs and verb phrases
When two verbs share their objects (or other complements), the subject and the object of the conjoined verbs are attached to every conjunct.
1 The _ _ _ _ 2 det _ _
2 store _ _ _ _ 3 nsubj _ _
3 buys _ _ _ _ 0 root _ _
4 and _ _ _ _ 5 cc _ _
5 sells _ _ _ _ 3 conj _ _
6 cameras _ _ _ _ 3 obj _ _
7 . _ _ _ _ 3 punct _ _
|
# visual-style 5 2 nsubj color:blue
# visual-style 5 6 obj color:blue
1 The _ _ _ _ 2 det _ _
2 store _ _ _ _ 3 nsubj 5:nsubj _
3 buys _ _ _ _ 0 root _ _
4 and _ _ _ _ 5 cc _ _
5 sells _ _ _ _ 3 conj _ _
6 cameras _ _ _ _ 3 obj 5:obj _
7 . _ _ _ _ 3 punct _ _
|
However, if the complements of the second verb are not shared, only the shared dependents are attached to every conjunct.
1 She _ _ _ _ 3 nsubj _ _
2 was _ _ _ _ 3 aux _ _
3 reading _ _ _ _ 0 root _ _
4 or _ _ _ _ 5 cc _ _
5 watching _ _ _ _ 3 conj _ _
6 a _ _ _ _ 7 det _ _
7 movie _ _ _ _ 5 obj _ _
8 . _ _ _ _ 3 punct _ _
|
# visual-style 5 1 nsubj color:blue
# visual-style 5 2 aux color:blue
1 She _ _ _ _ 3 nsubj 5:nsubj _
2 was _ _ _ _ 3 aux 5:aux _
3 reading _ _ _ _ 0 root _ _
4 or _ _ _ _ 5 cc _ _
5 watching _ _ _ _ 3 conj _ _
6 a _ _ _ _ 7 det _ _
7 movie _ _ _ _ 5 obj _ _
8 . _ _ _ _ 3 punct _ _
|
Similarly, the enhanced representation can also distinguish private dependents of the first verb. Note however that in this case it cannot be inferred from the basic representation automatically.
1 She _ _ _ _ 3 nsubj _ _
2 was _ _ _ _ 3 aux _ _
3 watching _ _ _ _ 0 root _ _
4 a _ _ _ _ 5 det _ _
5 movie _ _ _ _ 3 obj _ _
6 or _ _ _ _ 7 cc _ _
7 reading _ _ _ _ 3 conj _ _
8 . _ _ _ _ 3 punct _ _
|
# visual-style 7 1 nsubj color:blue
# visual-style 7 2 aux color:blue
1 She _ _ _ _ 3 nsubj 7:nsubj _
2 was _ _ _ _ 3 aux 7:aux _
3 watching _ _ _ _ 0 root _ _
4 a _ _ _ _ 5 det _ _
5 movie _ _ _ _ 3 obj _ _
6 or _ _ _ _ 7 cc _ _
7 reading _ _ _ _ 3 conj _ _
8 . _ _ _ _ 3 punct _ _
|
Controlled/raised subjects
The basic trees lack a subject dependency between a controlled verb and its controller
or between an embedded verb and its raised subject. In the enhanced graph, there is an
additional dependency between the embedded verb and the subject of the matrix clause.
This dependency can be recognized by the extension (subtype) :xsubj
.
Basic | Enhanced |
---|---|
1 Mary _ _ _ _ 2 nsubj _ _
2 wants _ _ _ _ 0 root _ _
3 to _ _ _ _ 4 mark _ _
4 buy _ _ _ _ 2 xcomp _ _
5 a _ _ _ _ 6 det _ _
6 book _ _ _ _ 4 obj _ _
7 . _ _ _ _ 2 punct _ _
|
# visual-style 4 1 nsubj:xsubj color:blue
1 Mary _ _ _ _ 2 nsubj 4:nsubj:xsubj _
2 wants _ _ _ _ 0 root _ _
3 to _ _ _ _ 4 mark _ _
4 buy _ _ _ _ 2 xcomp _ _
5 a _ _ _ _ 6 det _ _
6 book _ _ _ _ 4 obj _ _
7 . _ _ _ _ 2 punct _ _
|
1 She _ _ _ _ 2 nsubj _ _
2 seems _ _ _ _ 0 root _ _
3 to _ _ _ _ 5 mark _ _
4 be _ _ _ _ 5 aux _ _
5 reading _ _ _ _ 2 xcomp _ _
6 a _ _ _ _ 7 det _ _
7 book _ _ _ _ 5 obj _ _
8 . _ _ _ _ 2 punct _ _
|
# visual-style 5 1 nsubj:xsubj color:blue
1 She _ _ _ _ 2 nsubj 5:nsubj:xsubj _
2 seems _ _ _ _ 0 root _ _
3 to _ _ _ _ 5 mark _ _
4 be _ _ _ _ 5 aux _ _
5 reading _ _ _ _ 2 xcomp _ _
6 a _ _ _ _ 7 det _ _
7 book _ _ _ _ 5 obj _ _
8 . _ _ _ _ 2 punct _ _
|
1 Mary _ _ _ _ 2 nsubj _ _
2 made _ _ _ _ 0 root _ _
3 me _ _ _ _ 2 obj _ _
4 buy _ _ _ _ 2 xcomp _ _
5 the _ _ _ _ 6 det _ _
6 house _ _ _ _ 4 obj _ _
7 . _ _ _ _ 2 punct _ _
|
# visual-style 4 3 nsubj:xsubj color:blue
1 Mary _ _ _ _ 2 nsubj _ _
2 made _ _ _ _ 0 root _ _
3 me _ _ _ _ 2 obj 4:nsubj:xsubj _
4 buy _ _ _ _ 2 xcomp _ _
5 the _ _ _ _ 6 det _ _
6 house _ _ _ _ 4 obj _ _
7 . _ _ _ _ 2 punct _ _
|
1 Mary _ _ _ _ 2 nsubj _ _
2 wants _ _ _ _ 0 root _ _
3 me _ _ _ _ 2 obj _ _
4 to _ _ _ _ 6 mark _ _
5 be _ _ _ _ 6 aux:pass _ _
6 promoted _ _ _ _ 2 xcomp _ _
7 . _ _ _ _ 2 punct _ _
|
# visual-style 6 3 nsubj:pass:xsubj color:blue
1 Mary _ _ _ _ 2 nsubj _ _
2 wants _ _ _ _ 0 root _ _
3 me _ _ _ _ 2 obj 6:nsubj:pass:xsubj _
4 to _ _ _ _ 6 mark _ _
5 be _ _ _ _ 6 aux:pass _ _
6 promoted _ _ _ _ 2 xcomp _ _
7 . _ _ _ _ 2 punct _ _
|
Relative clauses
In basic trees, relative pronouns are attached to the main predicate of the relative clause (typically with a nsubj
or obj
relation). In the corresponding enhanced graphs, the relative pronoun is attached to its antecedent with the
special ref
relation and the antecedent is attached as a dependent of the node that is the parent of the relative
pronoun in the basic tree. Typically this parent is the main predicate of the relative clause, but it is not always so
(see examples below).
In the case where there is no explicit relative pronoun, there is no ref
relation in the enhanced graph but the
antecedent is still annotated as a dependent of a node in the relative clause, depending on the role it plays in the
relative clause.
Note that such graphs contain a cycle.
# visual-style 4 3 nsubj color:red
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 boy boy NOUN _ Gender=Masc|Number=Sing 0 root _ _
3 who who PRON _ PronType=Rel 4 nsubj _ _
4 lived lived VERB _ Mood=Ind|Tense=Past|VerbForm=Fin 2 acl:relcl _ _
|
# visual-style 4 2 nsubj color:blue
# visual-style 2 3 ref color:blue
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 boy boy NOUN _ Gender=Masc|Number=Sing 0 root 4:nsubj _
3 who who PRON _ PronType=Rel 2 ref _ _
4 lived lived VERB _ Mood=Ind|Tense=Past|VerbForm=Fin 2 acl:relcl _ _
|
# visual-style 5 3 obj color:red
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 book book NOUN _ Gender=Neut|Number=Sing 0 root _ _
3 that that PRON _ PronType=Rel 5 obj _ _
4 I I PRON _ Number=Sing|Person=1|PronType=Prs 5 nsubj _ _
5 read read VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
# visual-style 5 2 obj color:blue
# visual-style 2 3 ref color:blue
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 book book NOUN _ Gender=Neut|Number=Sing 0 root 5:obj _
3 that that PRON _ PronType=Rel 2 ref _ _
4 I I PRON _ Number=Sing|Person=1|PronType=Prs 5 nsubj _ _
5 read read VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 book book NOUN _ Gender=Neut|Number=Sing 0 root _ _
3 I I PRON _ Number=Sing|Person=1|PronType=Prs 4 nsubj _ _
4 read read VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
# visual-style 4 2 obj color:blue
1 the the DET _ Definite=Def|PronType=Art 2 det _ _
2 book book NOUN _ Gender=Neut|Number=Sing 0 root 4:obj _
3 I I PRON _ Number=Sing|Person=1|PronType=Prs 4 nsubj _ _
4 read read VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
Adverbial relativizers receive the same treatment.
# visual-style 5 3 advmod color:red
1 the the DET DT Definite=Def|PronType=Art 2 det _ _
2 episode episode NOUN NN Number=Sing 0 root _ _
3 where where ADV WRB PronType=Rel 5 advmod _ _
4 Monica Monica PROPN NNP Number=Sing 5 nsubj _ _
5 sings sing VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
# visual-style 2 3 ref color:blue
# visual-style 5 2 obl color:blue
1 the the DET DT Definite=Def|PronType=Art 2 det _ _
2 episode episode NOUN NN Number=Sing 0 root 5:obl _
3 where where ADV WRB PronType=Rel 2 ref _ _
4 Monica Monica PROPN NNP Number=Sing 5 nsubj _ _
5 sings sing VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 2 acl:relcl _ _
|
The enhanced relations include deep syntactic relations. Therefore, in case marking languages the enhanced dependencies
may link verb dependents that are not in the expected morphological case, required by surface syntax. In the following
Czech example, the relative modifier phrase v němž “in which” is obligatorily in the locative case form
(Case=Loc
). If it were a main clause, the referent dům “house” would have to be in locative too: v domě
“in house”. However, here it is in the nominative (Case=Nom
), and the enhanced dependency obl
going to a nominative
dependent is something we would not expect to see, given the morpho-syntactic rules of the language.
# visual-style 5 4 obl color:red
1 dům house NOUN _ Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing 0 root _ _
2 , , PUNCT _ _ 5 punct _ _
3 v in ADP _ _ 4 case _ _
4 němž that PRON _ Case=Loc|Gender=Masc|Number=Sing|PronType=Rel 5 obl _ _
5 žijeme live VERB _ Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 1 acl:relcl _ _
|
# visual-style 5 1 obl color:blue
# visual-style 1 4 ref color:blue
1 dům house NOUN _ Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing 0 root 5:obl _
2 , , PUNCT _ _ 5 punct _ _
3 v in ADP _ _ 4 case _ _
4 němž that PRON _ Case=Loc|Gender=Masc|Number=Sing|PronType=Rel 1 ref _ _
5 žijeme live VERB _ Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 1 acl:relcl _ _
|
The relative element does not always depend directly on the predicate of the relative clause. It may be embedded deeper as in the following example.
# visual-style 5 4 det color:red
1 muž man NOUN _ Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing 0 root _ _
2 , , PUNCT _ _ 6 punct _ _
3 v in ADP _ _ 5 case _ _
4 jehož whose DET _ Gender[psor]=Masc|Number[psor]=Plur|Poss=Yes|PronType=Rel 5 det _ _
5 domě house NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing 6 obl _ _
6 žijeme live VERB _ Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 1 acl:relcl _ _
|
# visual-style 5 1 nmod color:blue
# visual-style 1 4 ref color:blue
1 muž man NOUN _ Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing 0 root 5:nmod _
2 , , PUNCT _ _ 6 punct _ _
3 v in ADP _ _ 5 case _ _
4 jehož whose DET _ Gender[psor]=Masc|Number[psor]=Plur|Poss=Yes|PronType=Rel 1 ref _ _
5 domě house NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing 6 obl _ _
6 žijeme live VERB _ Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 1 acl:relcl _ _
|
If the relative clause has a nominal predicate, the relative pronoun may occupy the head position within the clause.
Unlike most relative clauses, here the parent of the relative pronoun in the basic tree is not inside the relative
clause, and its antecedent will not have an additional enhanced relation attaching it to a (non-existent) parent in
the relative clause. Instead, we add a nsubj
relation from the antecedent to the nsubj
of the relative clause
(and remove the corresponding nsubj
relation between the relative pronoun and the subject). The acl:relcl
should
remain the same as in basic dependencies.
# visual-style 5 6 nsubj color:red
1 He he PRON _ Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _
2 became become VERB _ Mood=Ind|Tense=Past|VerbForm=Fin 0 root _ _
3 chairman chairman NOUN _ Number=Sing 2 xcomp _ SpaceAfter=No
4 , , PUNCT _ _ 5 punct _ _
5 which which PRON _ PronType=Rel 3 acl:relcl _ _
6 he he PRON _ Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 5 nsubj _ _
7 still still ADV _ _ 5 advmod _ _
8 is be AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 5 cop _ SpaceAfter=No
9 . . PUNCT _ _ 2 punct _ _
|
# visual-style 3 6 nsubj color:blue
# visual-style 3 5 ref color:blue
1 He he PRON _ Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 2 nsubj _ _
2 became become VERB _ Mood=Ind|Tense=Past|VerbForm=Fin 0 root _ _
3 chairman chairman NOUN _ Number=Sing 2 xcomp _ SpaceAfter=No
4 , , PUNCT _ _ 5 punct _ _
5 which which PRON _ PronType=Rel 3 acl:relcl 3:ref _
6 he he PRON _ Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs 3 nsubj _ _
7 still still ADV _ _ 5 advmod _ _
8 is be AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 5 cop _ SpaceAfter=No
9 . . PUNCT _ _ 2 punct _ _
|
Case Information
Adding prepositions (or case information) to the relation name of non-core dependents often makes it possible to disambiguate its
semantic role. We therefore augment certain relation labels with the case information of the modifier.
The augmented relations are nmod
, acl
, obl
and advcl
; if it makes sense in the language, some core relations may also be
augmented: obj
, iobj
, ccomp
.
Case information may be represented by the lemma of an adposition attached via a case
relation.
For clauses, the corresponding information may be represented by the lemma of a mark
dependent instead.
Case information may also be represented by the value of the morphological feature Case.
In some languages, there is both the adposition and the morphological case, and their combination must be reflected in the enhanced relation.
In a similar manner, enhanced UD graphs also contain conj
relations that are augmented with their coordinating conjunction.
This makes the type of coordination between two phrases more explicit which is particularly useful in phrases with multiple
coordinating conjunctions.
The following formal rules apply (copied from the summary at the beginning of this page):
- Adposition or conjunction that occurs as a
case
ormark
orcc
dependent of the node whose relation to its parent is being enhanced. Note that this is the only part where non-ASCII letters are permitted within the enhanced relation label. The word should be normalized (lowercased, no typos), i.e., in general we take its lemma. However, if the case/mark dependent is a fixed multi-word expression, the lemma of the expression is not necessarily composed of lemmas of the individual member words. For instance, the string representing the English expression “As Opposed To” isas_opposed_to
. That is, the casing is normalized from “As” to “as” etc., but “opposed” is not replaced by its lemma “oppose” because the expression is fixed. Similarly, grammaticalized deverbal connectives such as “regarding” may in some languages (if required by the language-specific guidelines) still be tagged VERB, despite being attached as case, and their lemma will thus be verbal (“regard”); nevertheless, the corresponding deprel extension should be the grammaticalized form, i.e., “regarding”. Language-specific guidelines may also specify that certain synonyms (e.g., “toward” and “towards”) be mapped on the same enhanced label, despite having different lemmas. We use the underscore character (“_”) to connect member words. The same approach can also be taken when a node has multiple case markers that are not annotated as a fixed expression, e.g.,out_of
for “out of business”.- Multiple
case
ormark
nodes may occur even if it is not a fixed expression. For example, a type of adverbial clause in Dutch uses two markers om and te, the first one roughly corresponding to English “so that”, the second one being an infinitive marker. The incoming dependency of the subordinate clause will then be labeledadvcl:om_te
. - Case markers may be coordinated, as in they transport goods to and from Prague. Here there are two different relations
between the verb and the nominal:
obl:to
andobl:from
. Both will be added to the enhanced graph.
- Multiple
- Morphological case of the node whose relation to its parent is being enhanced. Value corresponds to the value of
the Case feature but it is lowercased (e.g.,
gen
instead ofGen
). Unlike in morphological features, multivalues with comma (Case=Acc,Dat
) are not allowed. Case information in enhanced relations must be fully disambiguated.- In certain languages and situations, the morphological case is combined with a lexical case marker (adposition). This is particularly useful if adpositions in the language select a subset of the morphological cases available and if the same adposition may have different meanings with different morphological cases.
- It may happen that two adpositions are coordinated, each selects a different morphological case and the noun can satisfy only
one of the case requirements. For instance, [cs] Lidé se rozutekli před a během útoku. “People ran away before
and during the attack.” The first preposition requires instrumental, the second requires genitive, the noun is in genitive.
However, the relations in the enhanced graph should be
obl:před:ins
andobl:během:gen
. The first relation should indicate instrumental despite the fact that the surface form of the noun in the current sentence is not instrumental, and its morphological feature isCase=Gen
. The relationobl:před:gen
does not exist in the language and has no meaning. (Note however that instrumental is not the only option with this preposition; accusative is also possible, andobl:před:acc
does not mean the same thing asobl:před:ins
.)
# visual-style 2 5 nmod color:red
1 the _ _ _ _ 2 det _ _
2 house _ _ _ _ 0 root _ _
3 on _ _ _ _ 5 case _ _
4 the _ _ _ _ 5 det _ _
5 hill _ _ _ _ 2 nmod _ _
|
# visual-style 2 5 nmod:on color:blue
1 the _ _ _ _ 2 det _ _
2 house _ _ _ _ 0 root _ _
3 on _ _ _ _ 5 case _ _
4 the _ _ _ _ 5 det _ _
5 hill _ _ _ _ 2 nmod:on _ _
|
# visual-style 2 5 obl color:red
# visual-style 2 7 advcl color:red
1 He _ _ _ _ 2 nsubj _ _
2 went _ _ _ _ 0 root _ _
3 to _ _ _ _ 5 case _ _
4 the _ _ _ _ 5 det _ _
5 dinner _ _ _ _ 2 obl _ _
6 after _ _ _ _ 7 mark _ _
7 leaving _ _ _ _ 2 advcl _ _
8 work _ _ _ _ 7 obj _ _
9 . _ _ _ _ 2 punct _ _
|
# visual-style 2 5 obl:to color:blue
# visual-style 2 7 advcl:after color:blue
1 He _ _ _ _ 2 nsubj _ _
2 went _ _ _ _ 0 root _ _
3 to _ _ _ _ 5 case _ _
4 the _ _ _ _ 5 det _ _
5 dinner _ _ _ _ 2 obl:to _ _
6 after _ _ _ _ 7 mark _ _
7 leaving _ _ _ _ 2 advcl:after _ _
8 work _ _ _ _ 7 obj _ _
9 . _ _ _ _ 2 punct _ _
|
# visual-style 2 4 nmod color:red
# text = the destruction of the city
1 die the DET _ Case=Gen 2 det _ _
2 Zerstörung destruction NOUN _ Case=Nom 0 root _ _
3 der the DET _ Case=Gen 4 det _ _
4 Stadt city NOUN _ Case=Gen 2 nmod _ _
|
# visual-style 2 4 nmod:gen color:blue
# text = the destruction of the city
1 die the DET _ Case=Gen 2 det _ _
2 Zerstörung destruction NOUN _ Case=Nom 0 root _ _
3 der the DET _ Case=Gen 4 det _ _
4 Stadt city NOUN _ Case=Gen 2 nmod:gen _ _
|
# visual-style 2 5 obl color:red
# text = He sits on the floor
1 Er he PRON _ Case=Nom 2 nsubj _ _
2 sitzt sits NOUN _ _ 0 root _ _
3 auf on ADP _ _ 5 case _ _
4 dem the DET _ Case=Dat 5 det _ _
5 Boden floor NOUN _ Case=Dat 2 obl _ SpaceAfter=No
6 . . PUNCT _ _ 2 punct _ _
|
# visual-style 2 5 obl:auf:dat color:blue
# text = He sits on the floor
1 Er he PRON _ Case=Nom 2 nsubj _ _
2 sitzt sits NOUN _ _ 0 root _ _
3 auf on ADP _ _ 5 case _ _
4 dem the DET _ Case=Dat 5 det _ _
5 Boden floor NOUN _ Case=Dat 2 obl:auf:dat _ SpaceAfter=No
6 . . PUNCT _ _ 2 punct _ _
|
# visual-style 2 6 obl color:red
# text = He sits down on the floor
1 Er he PRON _ Case=Nom 2 nsubj _ _
2 setzt sets NOUN _ _ 0 root _ _
3 sich himself PRON _ Case=Acc 2 expl:pv _ _
4 auf on ADP _ _ 6 case _ _
5 den the DET _ Case=Acc 6 det _ _
6 Boden floor NOUN _ Case=Acc 2 obl _ SpaceAfter=No
7 . . PUNCT _ _ 2 punct _ _
|
# visual-style 2 6 obl:auf:acc color:blue
# text = He sits down on the floor
1 Er he PRON _ Case=Nom 2 nsubj _ _
2 setzt sets NOUN _ _ 0 root _ _
3 sich himself PRON _ Case=Acc 2 expl:pv _ _
4 auf on ADP _ _ 6 case _ _
5 den the DET _ Case=Acc 6 det _ _
6 Boden floor NOUN _ Case=Acc 2 obl:auf:acc _ SpaceAfter=No
7 . . PUNCT _ _ 2 punct _ _
|
# visual-style 5 4 obl:tmod color:red
# visual-style 6 7 nmod color:red
# text = For a long time he studied the Maya language.
1 В In ADP _ _ 4 case _ _
2 течение duration NOUN _ Case=Loc 1 fixed _ _
3 долгого long ADJ _ Case=Gen 4 amod _ _
4 времени time NOUN _ Case=Gen 5 obl:tmod _ _
5 изучал studied VERB _ _ 0 root _ _
6 язык language NOUN _ Case=Acc 5 obj _ _
7 майя Maya PROPN _ Case=Gen 6 nmod _ SpaceAfter=No
8 . . PUNCT _ _ 5 punct _ _
|
# visual-style 5 4 obl:tmod:в_течение:gen color:blue
# visual-style 6 7 nmod:gen color:blue
# text = For a long time he studied the Maya language.
1 В In ADP _ _ 4 case _ _
2 течение duration NOUN _ Case=Loc 1 fixed _ _
3 долгого long ADJ _ Case=Gen 4 amod _ _
4 времени time NOUN _ Case=Gen 5 obl:tmod:в_течение:gen _ _
5 изучал studied VERB _ _ 0 root _ _
6 язык language NOUN _ Case=Acc 5 obj _ _
7 майя Maya PROPN _ Case=Gen 6 nmod:gen _ SpaceAfter=No
8 . . PUNCT _ _ 5 punct _ _
|
# visual-style 3 7 obl color:red
# visual-style 4 6 conj color:red
# text = Lidé se rozutekli před a během útoku.
1 Lidé People NOUN _ Case=Nom 3 nsubj _ _
2 se themselves PRON _ Case=Acc 3 expl:pv _ _
3 rozutekli scattered VERB _ _ 0 root _ _
4 před before ADP _ Case=Ins 7 case _ _
5 a and CCONJ _ _ 6 cc _ _
6 během during ADP _ Case=Gen 4 conj _ _
7 útoku attack NOUN _ Case=Gen 3 obl _ SpaceAfter=No
8 . . PUNCT _ _ 3 punct _ _
|
# visual-style 3 7 obl:během:gen color:blue
# visual-style 3 7 obl:před:ins color:blue
# visual-style 4 6 conj:a color:blue
# text = Lidé se rozutekli před a během útoku.
1 Lidé People NOUN _ Case=Nom 3 nsubj _ _
2 se themselves PRON _ Case=Acc 3 expl:pv _ _
3 rozutekli scattered VERB _ _ 0 root _ _
4 před before ADP _ Case=Ins 7 case _ _
5 a and CCONJ _ _ 6 cc _ _
6 během during ADP _ Case=Gen 4 conj:a _ _
7 útoku attack NOUN _ Case=Gen 3 obl:během:gen 3:obl:před:ins SpaceAfter=No
8 . . PUNCT _ _ 3 punct _ _
|
# visual-style 1 3 conj color:red
# visual-style 1 6 conj color:red
1 apples _ _ _ _ 0 root _ _
2 and _ _ _ _ 3 cc _ _
3 bananas _ _ _ _ 1 conj _ SpaceAfter=No
4 , _ _ _ _ 6 punct _ _
5 or _ _ _ _ 6 cc _ _
6 oranges _ _ _ _ 1 conj _ _
|
# visual-style 1 3 conj:and color:blue
# visual-style 1 6 conj:or color:blue
1 apples _ _ _ _ 0 root _ _
2 and _ _ _ _ 3 cc _ _
3 bananas _ _ _ _ 1 conj:and _ SpaceAfter=No
4 , _ _ _ _ 6 punct _ _
5 or _ _ _ _ 6 cc _ _
6 oranges _ _ _ _ 1 conj:or _ _
|
Additional enhancements
Some postprocessing steps such as demoting light nouns that behave like quantificational determiners (as, for example, described in Schuster and Manning (2016)) can improve the usability of the dependency graphs for downstream applications. However, as most of these additions are highly language-specific, we do not provide any universal guidelines for such a representation and anything beyond the above additions is not part of the UD standard and should not be added to the officially released treebanks.
DZ: Here are some additional thoughts on things that are not part of the officially approved guidelines but I think that they should be considered for addition in the future (based on experience with the treebanks that already contain some enhanced annotation).
- While individual enhancement types are optional, once a particular enhancement type is annotated somewhere in the corpus, the authors should annotate it everywhere in the corpus. This cannot be checked automatically for some enhancement types, but obviously the user will then assume that non-presence of the annotation in a sentence means that the phenomenon does not occur there.
- It would be useful if one could recognize from the enhanced relation type what type of enhancement it represents. (Some relations may be a result of two enhancement types combined.) The Stanford Enhancer does this at least for the controlled subjects (generating
nsubj:xsubj
,nsubj:pass:xsubj
,csubj:xsubj
, orcsubj:pass:xsubj
for the new enhanced relation).