home edit page issue tracker

This page pertains to UD version 2.

UD for Zaar

Tokenization and Word Segmentation

(Northwest) Gbaya is an isolating and tonal language with little morphology and no agreement. Gbaya relies minimally on derivation but makes strong use of compounding, marked in writing by a hyphen between components (e.g., gɛ̰̀ɛ̰́-fìò ‘ceremony sp.’), which is also used for adjectives-adverbs with a reduplicated structure (e.g., bàɗàm-bàɗàm ‘irregularly arranged’).

Personal pronouns have clitic forms, which were transcribed in the original corpus with a preceding ‘=’ (for instance, 1SG is transcribed as ʔám, =ám, =m). For consistency, all personal pronouns are treated similarly, even if they do not have a form variation (thus, 2SG is transcribed as mɛ́ and =mɛ́). We have attached them to their host (as a direct object of a verb or introduced by a preposition, for instance). Personal pronouns are the only clitics and the only words that display, for some of them, a variation from their free-standing form.

The tokens thus consist of simple words, compounds, or structurally reduplicated words, plus eventually pronominal clitics.

Morphology

The Gbaya treebank comes from an oral corpus interlinearized and glossed on a morphological basis using Toolbox and Elan, part of which (4.5 hour corpus) has been used to establish a corpus-based grammar of Gbaya. Some grammatical morphemes, especially TAMs and genitive markers (borne by the governing noun), are realized by tons. The original segmentation into morphemes is kept in the feature Mseg, as well as the corresponding glosses in MGloss. The language specific tagset is the original annotation made from the extended version of the Leipzig Glossing Rules. (Available at here). The treebank has been originally annotated into mSUD.

Tags


Features

Gbaya treebank uses 11 universal features (Aspect Case Degree Number Person Polarity Polite Poss PronType VerbForm Voice).

3 langage specific features have been added to the scheme (Logophoric, Reported and Topicalization).

Syntax

The dependency analysis is a conversion of the manual annotation to mSUD format. For more information, see SUD guidelines.