UD for Bambara
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. These are by default tokenized as separate tokens (words).
Exceptions:
- The apostrophe may be part of the word: k’, n’, y’, b’, t’
- The hyphen does not split a reduplicated word into multiple tokens: kelen-kelen
- For more details, see tokenization.
Morphology
Tags
This is an overview only. For more detailed discussion and examples, see the list of Bambara POS tags and Bambara features.
- Bambara uses all 17 universal POS categories, including particles (PART). At present, the data does not contain any examples of the SYM category.
- Auxiliaries are used regularly in Bambara clauses. They appear between the subject nominal and the verb
(or, if an object is present, between subject and object). There is no designated copula; placing one of
the regular auxiliaries between two nominals creates a nonverbal clause.
- tùn for the periphrastic past tense.
- bɛ́na, na is the future affirmative auxiliary.
- tɛ́na is the future negative auxiliary.
- bɛ is the imperfect affirmative auxiliary.
- tɛ is the imperfect negative auxiliary.
- ye for the periphrastic perfective aspect of transitive verbs.
- ma is the perfect negative auxiliary.
- mána for the conditional mood.
- kàna for the negative imperative (prohibitive).
- ka is used in multiple contexts, marking infinitive or subjunctive but also optative and simple affirmative.
- Verbs with modal meaning are not considered auxiliary in Bambara.
- There are two main (de)verbal forms, distinguished by the UPOS tag and the value of the VerbForm feature:
Features
- The two main values of the Number feature are
Sing
andPlur
. The singular is annotated only for pronouns. Nouns and other nominals have either noNumber
feature, orNumber=Plur
if the plural suffix -w is present. - Verbal features such as Tense, Aspect, Mood and Polarity are typically provided by auxiliaries (AUX) and annotated on them.
- PronType applies to pronouns (PRON), determiners (DET), and to the interrogative adverb (ADV) min “where”.
Syntax
- The default word order is subject – auxiliary – object – verb – oblique modifiers. In nonverbal clauses, the auxiliary occurs between the subject and the nonverbal predicate.
- Nominals may occur with postpositions but the core arguments (subject, object) are usually bare noun phrases.
Treebanks
There is 1 Bambara UD treebank: