UD Tswana Popapolelo
Language: Tswana (code: tn
)
Family: Niger-Congo
This treebank has been part of Universal Dependencies since the UD v2.14 release.
The following people have contributed to making this treebank part of UD: Ansu Berg, Roald Eiselen, Tanja Gaustad, Rigardt Pretorius.
Repository: UD_Tswana-Popapolelo
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Tswana-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [tanja • gaustad (æt) nwu • ac • za]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | not available |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | assigned by a program, with some manual corrections, but not a full manual verification |
Features | assigned by a program, with some manual corrections, but not a full manual verification |
Relations | annotated manually, natively in UD style |
Description
UD Tswana-Popapolelo is a translation of the 20 Cairo Cicling sentences (https://github.com/UniversalDependencies/cairo) annotated with XPOS, UPOS and dependency relations.
There are 20 translated sentences with a total of 234 tokens.
The entire treebank is labeled as test set due to its size. If it is used for training in future research, the users should employ ten-fold cross-validation.
Acknowledgments
Translations and initial annotations performed by Ansu Berg, Rigardt Pretorius, Kevin Mavalela, and Kaboentle Maibi.
Statistics of UD Tswana Popapolelo
POS Tags
ADJ – ADV – AUX – CCONJ – NOUN – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Relations
advcl – advmod – amod – appos – aux – case – cc – ccomp – compound – conj – cop – discourse – expl – fixed – flat – iobj – mark – nmod – nsubj – nsubj:pass – obj – obl – obl:agent – obl:lmod – obl:tmod – orphan – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 20 sentences and 214 tokens.
- This corpus contains 23 tokens (11%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
Morphology
Tags
- This corpus uses 11 UPOS tags out of 17 possible: ADJ, ADV, AUX, CCONJ, NOUN, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: DET, NUM, ADP, INTJ, SYM, X
- This corpus contains 11 word types tagged as particles (PART): A, e, ga, go, ka, ke, kwa, le, mo, wa, ya
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- This corpus does not use the VerbForm feature.
Nominal Features
Degree and Polarity
- Neg
- ADV: ga, a, se
Verbal Features
Pronouns, Determiners, Quantifiers
Other Features
- NounClass
- Bantu1
- NOUN: Moagisani, Mosetsana, Rre, monna, morwarraagwe
- PART: wa
- PRON: o, a
- Bantu14
- ADJ: botoka
- NOUN: bohibidu
- Bantu15
- PRON: go
- Bantu17
- PART: ga
- Bantu2
- PRON: ba
- Bantu3
- NOUN: moriri, mošate
- PART: wa
- Bantu5
- ADJ: lengwe
- NOUN: lekwalo, lebaka, lebelo, legora, letlhabaphefo, letsatsi
- PRON: le, leo, lê
- Bantu7
- NOUN: selefera
- Bantu9
- ADJ: kgolo, nnye
- NOUN: koloi, baesekele, boronse, gauta, kakanyo, naga, phaposing, pula, tsala
- PART: e, ya
- PRON: e, epe
- Bantu1
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (5)
- VERB--PRON (14)
- obj
- VERB--NOUN (8)
- iobj
- VERB--NOUN (1)
Relations Overview
- This corpus uses 4 relation subtypes: nsubj:pass, obl:agent, obl:lmod, obl:tmod
- The following 11 relation types are not used in this corpus at all: csubj, dislocated, nummod, acl, det, clf, list, parataxis, goeswith, reparandum, dep