home edit page issue tracker

This page pertains to UD version 2.

UD German HDT

Language: German (code: de)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.4 release.

The following people have contributed to making this treebank part of UD: Emanuel Borges Völker, Felix Hennig, Arne Köhn, Maximilan Wendt, Verena Blaschke, Nina Böbel, Leonie Weissweiler.

Repository: UD_German-HDT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: news, nonfiction, web

Questions, comments? General annotation questions (either German-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [nina • boebel (æt) hhu • de]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD
UPOS annotated manually in non-UD style, automatically converted to UD
XPOS assigned by a program, with some manual corrections, but not a full manual verification
Features annotated manually in non-UD style, automatically converted to UD
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

UD German-HDT is a conversion of the Hamburg Dependency Treebank, created at the University of Hamburg through manual annotation in conjunction with a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser.

The Hamburg Dependency Treebank consists of 261,821 sentences (4.8M tokens). The sentences were all sourced from the German news site heise.de, from articles published between 1996 and 2001. The content of the articles ranges from formulaic periodic updates on new BIOS revisions and processor models or quarterly earnings of tech companies over features about general trends in the hardware and software market to general coverage of social, legal and political issues in cyberspace, sometimes in the form of extensive weekly editorial comments. The creation of the treebank through manual annotation was largely interleaved with the creation of a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser.

For UD_German-HDT, 206,794 sentences (3.8M tokens) from the original HDT were converted with TrUDucer, a treebank conversion tool created by Felix Hennig and extended by Maximilian Wendt and Emanuel Borges Völker. The conversion has a very high accuracy of 97% (checked on a manually converted subset of the treebank). Annotation information not captured in the original annotation was resolved by using external data sources (Wiktionary) and manual input from annotators.

Acknowledgments

The following people worked on the conversion:

The following people are working on error correction:

References

If you use this treebank, please cite the following paper, describing the conversion of the HDT to UD:

@inproceedings{borges-volker-etal-2019-hdt,
title = "{HDT}-{UD}: A very large {U}niversal {D}ependencies Treebank for {G}erman",
author = {Borges V{\"o}lker, Emanuel and Wendt, Maximilian and Hennig, Felix and K{\"o}hn, Arne},
editor = "Rademaker, Alexandre and Tyers, Francis",
booktitle = "Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)",
month = aug,
year = "2019",
address = "Paris, France",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W19-8006",
doi = "10.18653/v1/W19-8006",
pages = "46--57",
}

The TrUDucer paper describing the formalism behind the conversion:

Hennig, Felix, & Köhn, Arne (2017). Dependency tree transformation
with tree transducers. In Proceedings of the NoDaLiDa 2017 Workshop on
Universal Dependencies (UDW 2017) (pp. 58–66). Gothenburg, Sweden:
Association for Computational Linguistics. url:
http://www.aclweb.org/anthology/W17-0407

The paper describing the HDT:

@inproceedings{hennig-kohn-2017-dependency,
title = "Dependency Tree Transformation with Tree Transducers",
author = {Hennig, Felix and K{\"o}hn, Arne},
editor = "de Marneffe, Marie-Catherine and Nivre, Joakim and Schuster, Sebastian",
booktitle = "Proceedings of the {N}o{D}a{L}i{D}a 2017 Workshop on Universal Dependencies ({UDW} 2017)",
month = may,
year = "2017",
address = "Gothenburg, Sweden",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W17-0407",
pages = "58--66",
}

The annotation guidelines of the original HDT:

@article{foth2006umfassende,
title={Eine umfassende Constraint-Dependenz-Grammatik des Deutschen},
author={Foth, Kilian A},
year={2006},
publisher={Fachbereich Informatik}
}

Software

TrUDucer the software used to convert the HDT. Comes with a pipeline to replicate the conversion of the HDT.

jwcdg, the successor of the parser used for initial automatic annotation of the HDT. It contains the lexicon with the relevant morpho-syntactic features annotated.

Statistics of UD German HDT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AbbrAdpTypeAspectCaseConjTypeDefiniteDegreeForeignGenderGender[psor]HyphMoodNumberNumber[psor]NumTypePartTypePersonPolarityPolitePossPronTypePunctTypeReflexTenseTypoVariantVerbFormVerbType

Relations

aclacl:relcladvcladvcl:relcladvmodamodapposauxaux:passcaseccccompcompoundcompound:prtconjcopcsubjcsubj:passdepdetdet:possdiscourseexplexpl:pvfixedflatflat:namemarknmodnmod:possnsubjnsubj:passnummodobjoblobl:argorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview