UD for Xavante 
Tokenization and Word Segmentation
- Xavante uses all 17 UPOS.
- Tokenization and semgmentatoin in Xavante is not straightforward, since descriptions do not agree among themselves.
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words).
Mapping UPOS to XPOS Xavante
ADJ | adj |
ADV | adv |
INTJ | intj |
NOUN | n |
PROPN | ppn |
VERB | v, vi, vt |
ADP | pp |
AUX | aux |
CCONJ | cc |
DET | det |
NUM | num |
PART | pcl |
PRON | pro |
SCONJ | sc |
PUNCT | punct |
SYM | sym |
X | x |
- Xavante (typological profile)
- Nominal words, NOUN, PROPN and PRON, are not marked for Gender, plural or animacy.
- There are no classifiers.
- The two main values of the Number feature are Sing and Plural.
- The notion of plural is expressed through numerals, particles or through reduplication:
Person indexes
Xavante has a complicated system of indexation, using many different sets of markers. These are given in the tables below.
Xavante has postpositions
Nominal Reduplication: it marks pluralilty in nouns, as in pi-pi ‘feetʼ. It can be monosyllabic (involving the first or the second syllable of the stem) or dissyllabic. Reduplication is associated with the feature-value
. -
Augmentative and diminutive: the diminutive morpheme is -tin
and the augmentative -atʃoAugm
. These morphemes refer to the size of something or the expansion of an event (if it is big or not).
Some verbs occur with the morpheme -ka, which is a transitivizer. This feature (
) takes the value YES if the when the morpheme is present. Verbs that may or not combine with this morpheme take no overt object or require two obligatory arguments. -
Verbal Reduplication: it assigns aspectual function in verbs, as in ãbi-bi ‘to pull successively.ʼ As in the nouns, it can be monosyllabic (involving the first or the second syllable of the stem) or dissyllabic. Reduplication is associated with the feature-value
. -
Nominalization: there are two productive nominalizer affixes in this language: -ap
and i-Nmzr=Obj
Xavante wh-words are built from words such as wa ‘who’, marĩ ‘what’ (man’s speech), tiha ‘what’ (woman’s speech), mamɛ ‘where’, mahãta ‘where is’, and momo ‘where to’. These are, in questions, predeced by the particle e, which indicates that the speaker requires new information.
who | e wa |
what (man) | e marĩ |
what (woman) | e tiha |
por quê? (man) | e marĩ bə |
por quê? (woman) | e tiha bə |
where | e mamɛ |
where to | e momo |
where (is) | e mahãta |
There are N Xavante UD treebanks:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.