home edit page issue tracker

This page pertains to UD version 2.

UD Pesh ChibErgIS

Language: Pesh (code: pay)
Family: Chibchan

This treebank has been part of Universal Dependencies since the UD v2.15 release.

The following people have contributed to making this treebank part of UD: Natalia Cáceres Arandia, Claudine Chamoreau, Sylvain Kahane, Bruno Guillaume.

Repository: UD_Pesh-ChibErgIS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Pesh-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [natalia • caceres • arandia (æt) cnrs • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

A Universal Dependencies corpus for Pesh (aka Paya), a member of the Chibchan language family. The language is spoken by about 500 speakers in Honduras.

The treebank is an automatic conversion of the SUD_Pesh-ChibErgIS, which is an automatic conversion of the mSUD_Pesh-ChibErgIS which was extracted from Claudine Chamoreau and Natalia Cáceres interlinearized corpus in Flex format, itself an extension of an oral corpus documented by Claudine Chamoreau (https://www.elararchive.org/dk0392).

Acknowledgments

Sentences are annotated with the following metadata: speaker_id (which identifies the turn of speech)

Structure

This version of the treebank is a dependency parsing of the original corpus first four files.

The original data are spoken data, which were originally segmented in words with concatenated clitics, then interlinearized and glossed in Flex with clitics as separate tokens. Tokens comprize words and affixes (preceded by a “=” sign).

The UD_Pesh-ChibErgIS counts 2,507 tokens for 307 sentences.

References

Acknowledgments

This treebank was produced as part of the ChibErgIS and Autogramm ANR projects. With special thanks to Bruno Guillaume for the conversion from SUD to UD, Sylvain Kahane, Christian Chanard, Uyên-To Rabier and Aleksandra Miletic.

Statistics of UD Pesh ChibErgIS

POS Tags

ADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AdvTypeAnimacyCaseClusivityPersonPronTypeVerbFormVoice

Relations

aclacl:relcladvcladvmodadvmod:lmodapposauxcaseccccompcompoundcompound:lvccompound:svcconjcopcsubjdepdep:conjdetdiscoursedislocatedmarknmodnsubjnsubj:outernsubj:passnummodobjoblobl:argobl:modorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview