UD Luxembourgish LuxBank
Language: Luxembourgish (code: lb
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Alistair Plum, Christoph Purschke, Caroline Döhmer, Anne-Marie Lutgen, Emilia Milano.
Repository: UD_Luxembourgish-LuxBank
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Luxembourgish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [alistair • plum (æt) uni • lu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
The LuxBank corpus currently consists of the translated Cairo Cicling examples, and will be extended to include examples from a national dataset. It is the first comprehensive tree bank dataset for Luxembourgish.
The LuxBank corpus is the first treebank corpus of Luxembourgish. While the initial test set consists of the translated Cairo Cicling examples, the corpus will be expanded to include texts from various domains, including but not limited to news articles, encyclopaedic articles and literary examples.
Acknowledgments
The translation of the initial Cairo Cicling examples to Luxembourgish was carried out by Christoph Purschke and Caroline Döhmer. The annotation of the initial set was carried out by Anne-Marie Lutgen, Emilia Milano and Caroline Döhmer, who also created the first set of guidelines for annotating Luxembourgish.
Statistics of UD Luxembourgish LuxBank
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – PART – PRON – PUNCT – SCONJ – VERB
Features
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – conj – cop – dep – det – det:poss – expl:pv – flat – iobj – mark – nmod – nmod:poss – nsubj – obj – obl – orphan – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 20 sentences, 204 tokens and 206 syntactic words.
- This corpus contains 35 tokens (17%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 1 types of words that contain both letters and punctuation. Examples: d'
- This corpus contains 2 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 2 types of multi-word tokens. Examples: opzehale, vum.
Morphology
Tags
- This corpus uses 12 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, PART, PRON, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: PROPN, NUM, INTJ, SYM, X
- This corpus contains 5 word types tagged as particles (PART): net, op, s, un, ze
- This corpus contains 9 lemmas tagged as pronouns (PRON): du, ech, en, et, hien, sech, si, wat, wien
- This corpus contains 12 lemmas tagged as determiners (DET): d', deen, dem, den, däin, dësen, en, hir, iergendeen, keng, mäin, säin
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: en
- This corpus contains 5 lemmas tagged as auxiliaries (AUX): ginn, hunn, kënnen, sinn, sollen
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: hunn
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: sinn.
- This corpus uses 5 lemmas as auxiliaries (aux). Examples: hunn, kënnen, ginn, sinn, sollen.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (5)
- VERB--PRON (17)
- obj
- VERB--NOUN (10)
- VERB--PRON (3)
- iobj
- VERB--NOUN-ADP(zu) (1)
- VERB--PRON (1)
Reflexive Verbs
- This corpus contains 1 lemmas that occur at least once with an expl:pv child. Examples: ëmaarmen sech
Relations Overview
- This corpus uses 3 relation subtypes: det:poss, expl:pv, nmod:poss
- The following 1 main types are not used alone, they are always subtyped: expl
- The following 11 relation types are not used in this corpus at all: csubj, dislocated, discourse, nummod, clf, fixed, compound, list, parataxis, goeswith, reparandum