home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Pesh ChibErgIS

Language: Pesh (code: pay)
Family: Chibchan

This treebank has been part of Universal Dependencies since the UD v2.15 release.

The following people have contributed to making this treebank part of UD: Natalia Cáceres Arandia, Claudine Chamoreau, Sylvain Kahane, Bruno Guillaume.

Repository: UD_Pesh-ChibErgIS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Pesh-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [natalia • caceres • arandia (æt) cnrs • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	annotated manually
UPOS	annotated manually, natively in UD style
XPOS	not available
Features	annotated manually, natively in UD style
Relations	annotated manually, natively in UD style

Description

A Universal Dependencies corpus for Pesh (aka Paya), a member of the Chibchan language family. The language is spoken by about 500 speakers in Honduras.

The treebank is an automatic conversion of the SUD_Pesh-ChibErgIS, which is an automatic conversion of the mSUD_Pesh-ChibErgIS which was extracted from Claudine Chamoreau and Natalia Cáceres interlinearized corpus in Flex format, itself an extension of an oral corpus documented by Claudine Chamoreau (https://www.elararchive.org/dk0392).

Acknowledgments

Sentences are annotated with the following metadata: speaker_id (which identifies the turn of speech)

sent_timecode (which will enable playback of the sentence)
morphemic_text: (original segmentation of the text into morphemes)
text: (lexical tokenization)
text_en: (English interpretation)
text_phrase-gls-de: (original id)
text_phrase-gls-es: (Spanish interpretation)
text_phrase-gls-it: (IPA transcription)
text_phrase-gls-pro: (prosodic transcription)
text_phrase-gls-tl: (original comments in Flex)
text_phrase-gls-wg: (original word-gloss in Flex) -

Structure

This version of the treebank is a dependency parsing of the original corpus first four files.

The original data are spoken data, which were originally segmented in words with concatenated clitics, then interlinearized and glossed in Flex with clitics as separate tokens. Tokens comprize words and affixes (preceded by a “=” sign).

The UD_Pesh-ChibErgIS counts 2,507 tokens for 307 sentences.

References

Chamoreau, Claudine. 2015. A cross-varietal documentation and description of Pesh, a Chibchan language of Honduras. Endangered Languages Archive. Handle: http://hdl.handle.net/2196/00-0000-0000-000F-BF49-B

Acknowledgments

This treebank was produced as part of the ChibErgIS and Autogramm ANR projects. With special thanks to Bruno Guillaume for the conversion from SUD to UD, Sylvain Kahane, Christian Chanard, Uyên-To Rabier and Aleksandra Miletic.

Statistics of UD Pesh ChibErgIS

POS Tags

ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X

Features

AdvType – Animacy – Case – Clusivity – Person – PronType – VerbForm – Voice

Relations

acl – acl:relcl – advcl – advmod – advmod:lmod – appos – aux – case – cc – ccomp – compound – compound:lvc – compound:svc – conj – cop – csubj – dep – dep:conj – det – discourse – dislocated – mark – nmod – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:arg – obl:mod – orphan – parataxis – punct – reparandum – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 307 sentences and 2508 tokens.

All tokens in this corpus are followed by a space.

This corpus does not contain words with spaces.

This corpus contains 10 types of words that contain both letters and punctuation. Examples: San.Esteban, akasteʃk(w)a, amaspariʃkaw(a), kapaʃbar(w)a, ke,, nãpar(w)a, sukuher(w)a, tarkasakw(a), teʔkertVw(a), yãhaw(a)

Morphology

Nominal Features

Animacy

Hum
- NOUN: taarwã

Case

Abs
- ADP: =ra, =ro
- SCONJ: =ro

Erg
- ADP: =ya

Nom
- ADP: =ma

Degree and Polarity

Verbal Features

Voice

Appl
- AUX: akatʃaitVri, akatʃaui, takatʃai, takatʃaii, takatʃauwa, takatʃawa, ũtakatʃaitVi
- VERB: artʃuiʃkari, artʃuiʃatVri, tarwarkuh, akasteʃkawa, akastok, arkapriʃi, artapuki, artʃuiʃbartVi, akaporki, akasteʃ
- VERB-Inf: artʃuiʃ

Cau
- VERB: ũkawa, ũweerwa, ũwarahparh

Mid
- VERB: taõʃi, atʃi, taiʃkari, taõʃ, taõʃkerwa, taõʃki, apastVpi, apiʃki, atuhwa, atuhweʃkwa

Rcp
- VERB: apuru, tVkaeri, tVkairi

Pronouns, Determiners, Quantifiers

PronType

Int
- ADP: =kanki
- PART: =kanka

Person

1
- AUX: tʃatVpa, =bartVwa
- VERB: piãpa, kaporpa, akonapa, artʃuiʃpa, kapai, kawiʃpa, kawiʃpai, paspa, peʔpa, piʃpa

2
- AUX: =rya
- VERB: kaya, takaya

3
- VERB: kawiʃkawa

Other Features

AdvType
- Ideoph
  - ADV: tõʃ, kluk, roh, teʔne, tukuluk

Clusivity
- Ex
  - AUX: tʃabaruri, =barwa, =bari, tʃaberuri, tʃaberwa, =bartVwa, ũtakatʃaitVi
  - NOUN: ũtaoryah, ũtayãha, ũtakaki, ũtaoryaha, ũtasira, ũtasuwa, ũtasãma
  - PART: ũtanĩhã
  - PRON: ũtas
  - VERB: tiʃbarwa, artʃuiʃbartVi, atʃahbari, kapaʃbarwa, artʃuiʃbarwa, kabarwa, kakoyoʃbari, kapaʃbar(w)a, kapaʃbari, kapaʃbarpi
- In
  - NOUN: patatiʃta, patasaʔa, patasã, pataya, patayãha, patayãhha, pataĩ
  - VERB: ãparh, amaskapiwa, amasparwa, akatipari, amaspari, iʃparwa, kapari, kaparwa, masperwa, nãapi

Syntax

Auxiliary Verbs and Copula

This corpus uses 2 lemmas as copulas (cop). Examples: r, _.

This corpus uses 3 lemmas as auxiliaries (aux). Examples: tʃa, ak, r.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (25)
- VERB--NOUN-ADP(=ma) (7)
- VERB--NOUN-ADP(=mã) (1)
- VERB--NOUN-ADP(=ya) (5)
- VERB--PRON (14)
- VERB--PRON-ADP(=ma) (7)
- VERB--PRON-ADP(=ma)-ADP(=ma) (1)
- VERB--PRON-ADP(=mã) (2)

obj
- VERB--NOUN (53)
- VERB--NOUN-ADP(=ma) (5)
- VERB--NOUN-ADP(=ra) (2)
- VERB--NOUN-ADP(=yo) (1)
- VERB--PRON (13)
- VERB--PRON-ADP(=ken) (1)
- VERB--PRON-ADP(=ma) (4)
- VERB--PRON-ADP(=ra) (4)
- VERB-Inf--PRON (1)

iobj

Relations Overview

This corpus uses 9 relation subtypes: acl:relcl, advmod:lmod, compound:lvc, compound:svc, dep:conj, nsubj:outer, nsubj:pass, obl:arg, obl:mod
The following 8 relation types are not used in this corpus at all: iobj, expl, amod, clf, fixed, flat, list, goeswith