UD for Xavante
Tokenization and Word Segmentation
- Xavante uses all 17 UPOS.
- Tokenization and semgmentatoin in Xavante is not straightforward, since descriptions do not agree among themselves.
- In general, words are delimited by whitespace characters. Description of exceptions follows.
- According to typographical rules, many punctuation marks are attached to a neighboring word. We always tokenize them as separate tokens (words).
Mapping UPOS to XPOS Xavante
UPOS | XPOS |
---|---|
ADJ | adj |
ADV | adv |
INTJ | intj |
NOUN | n |
PROPN | ppn |
VERB | v, vi, vt |
ADP | pp |
AUX | aux |
CCONJ | cc |
DET | det |
NUM | num |
PART | pcl |
PRON | pro |
SCONJ | sc |
PUNCT | punct |
SYM | sym |
X | x |
Morphology
Tags
- Xavante (typological profile)
Features
NOMINAL FEATURE
- Nominal words, NOUN, PROPN and PRON, are not marked for Gender, plural or animacy.
- There are no classifiers.
- The two main values of the Number feature are Sing and Plural.
- The notion of plural is expressed through numerals, particles or through reduplication:
Person indexes
Xavante has a complicated system of indexation, using many different sets of markers. These are given in the tables below.
-
Xavante has postpositions
-
Nominal Reduplication: it marks pluralilty in nouns, as in pi-pi ‘feetʼ. It can be monosyllabic (involving the first or the second syllable of the stem) or dissyllabic. Reduplication is associated with the feature-value
Redup
. -
Augmentative and diminutive: the diminutive morpheme is -tin
Dim
and the augmentative -atʃoAugm
. These morphemes refer to the size of something or the expansion of an event (if it is big or not).
VERBAL FEATURE
-
Some verbs occur with the morpheme -ka, which is a transitivizer. This feature (
Trans
) takes the value YES if the when the morpheme is present. Verbs that may or not combine with this morpheme take no overt object or require two obligatory arguments. -
Verbal Reduplication: it assigns aspectual function in verbs, as in ãbi-bi ‘to pull successively.ʼ As in the nouns, it can be monosyllabic (involving the first or the second syllable of the stem) or dissyllabic. Reduplication is associated with the feature-value
Redup
. -
Nominalization: there are two productive nominalizer affixes in this language: -ap
Nmzr=Circ
and i-Nmzr=Obj
.
Wh-words
Xavante wh-words are built from words such as wa ‘who’, marĩ ‘what’ (man’s speech), tiha ‘what’ (woman’s speech), mamɛ ‘where’, mahãta ‘where is’, and momo ‘where to’. These are, in questions, predeced by the particle e, which indicates that the speaker requires new information.
UPOS | XPOS |
---|---|
who | e wa |
what (man) | e marĩ |
what (woman) | e tiha |
por quê? (man) | e marĩ bə |
por quê? (woman) | e tiha bə |
where | e mamɛ |
where to | e momo |
where (is) | e mahãta |
Syntax
Treebanks
There are N Xavante UD treebanks:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.