UD Cebuano GJA
Language: Cebuano (code: ceb
)
Family: Austronesian
This treebank has been part of Universal Dependencies since the UD v2.10 release.
The following people have contributed to making this treebank part of UD: Glyd Aranes.
Repository: UD_Cebuano-GJA
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples
Questions, comments? General annotation questions (either Cebuano-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [glyd • aranes (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD_Cebuano_GJA is a collection of annotated Cebuano sample sentences randomly taken from three different sources: community-contributed samples from the website Tatoeba, a Cebuano grammar book by Bunye & Yap (1971) and Tanangkinsing’s reference grammar on Cebuano (2011). This project is currently work in progress.
This treebank is composed of 188 sample sentences across three corpora: 58 sentences from Bunye and Yap’s book, 46 sentences from Tanangkinsing’s books, and 84 sentences taken from Tatoeba. Annotation was done manually by Glyd Jun Arañes as part of his MA Thesis at the University of Eastern Finland for his Master’s Degree in Linguistic Data Sciences.
For suggestions on the treebank, you can contact Glyd through this email: glyd.aranes@gmail.com
Acknowledgments
References
- Bunye, M. & Yap, E. (1971). Cebuano for beginners. University of Hawaii Press: Honolulu, USA. ISBN: 9780824879778
- Tanangkinsing, M. (2011). A functional reference grammar of Cebuano: from a discourse perspective. Vol 1 & 2. Lambert Academic Publishing: Saarbrücken, Germany. ISBN: 978-3-8465-1024-7 / 978-3-8465-9150-5
- Tatoeba (2022). Sentences in Cebuano. From: https://tatoeba.org/en/sentences/show_all_in/ceb/none
Statistics of UD Cebuano GJA
POS Tags
ADJ – ADP – ADV – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB
Features
Aspect – Case – Clusivity – Definitizer – Degree – Deixis – Foreign – Gender – Mood – Neutral – Number – PartType – Person – Polarity – PronType – Voice
Relations
acl – advcl – advmod – amod – appos – case – cc – ccomp – compound – compound:redup – conj – csubj – det – discourse – fixed – flat – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – vocative
Tokenization and Word Segmentation
- This corpus contains 197 sentences, 1278 tokens and 1377 syntactic words.
- This corpus contains 215 tokens (17%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 22 types of words that contain both letters and punctuation. Examples: -ng, -g, bag-o, pag-abot, Gi-hold, Mag-unsa, Maka-iningles, Nag-unsa, Pag-ayo, gitudloa-, himo-an, ing-ana, ju-, kanus-a, mag-idad, naga-ulan, nagtan-aw, nawad-an, nitan-aw, panghuna-huna, second-hand, ting-init
- This corpus contains 97 multi-word tokens. On average, one multi-word token consists of 2.02 syntactic words.
- There are 51 types of multi-word tokens. Examples: akong, iyang, imong, Nganong, among, Daghang, anang, bag-ong, ilang, inyong, kag, kanang, katong, kog, silang, siyag, Ayawg, Karong, Maayong, Mura'g, Pag-ayo-ayo, Pasayloang, Siging, Totong, adis-adis, bang, batang, gitudloag, gustog, igsoong, imohang, inusarang, iyahang, jung, kadugayg, kong, labing, mig, ming, mipasiugdang, mong, nagtuong, nakog, nang, natawhang, ning, pagpatakag, pang, rag, rang.
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: AUX, SYM, X
- This corpus contains 31 word types tagged as particles (PART): -g, -ng, Ambot, Bitaw, ayaw, ba, di, diay, dili, gani, gud, gyud, intawon, ju-, jud, ka, kay, kuno, lagi, lang, man, mas, may, na, nga, nuon, pa, pay, ra, usa, wala
- This corpus contains 18 lemmas tagged as pronouns (PRON): ako, ana, ani, asa, ikaw, imo, inyo, iya, kadto, kami, kamo, kana, kini, kinsa, kita, sila, siya, unsa
- This corpus contains 2 lemmas tagged as determiners (DET): daghan, mga
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
- This corpus does not use the VerbForm feature.
Nominal Features
- Fem
- ADJ: gwapa, Guwapa
- NOUN: babaye, amiga
- PROPN: Mary, Ditang, Maria, Alicia, Ana, Carmen, Inday, Josie
- Masc
- ADJ: gwapo
- NOUN: lalake, tatay
- PROPN: Tom, Pedro, Juan, Adot, Lito, Rolando, Ruben, Tonying, Toto, Undo
- Dual
- PRON: taka, tika
- Plur
- DET: mga
- NOUN: kaislahan
- PRON: mi, sila, amo, ila, inyo, kamiy, miy, ta
- Sing
- PRON: ko, siya, ka, ako, iya, imo, nako, nimo, imoha, koy
- Dat
- ADP: sa, kang, ug, -g
- Gen
- ADP: ni, -g, ug, sa, og
- PRON: siya, iya, imo, nako, ko, nimo, amo, imoha, ila, inyo
- Nom
- ADP: ang, si, -g, -ng
- PRON: ko, ka, ako, mi, sila, koy, ikaw, taka, tika, Akoy
Degree and Polarity
- Pos
- ADJ: Maayo, adis, bag-o, hawod, taas, gwapa, Buutan, Dako, Duol, Hapit
- Sup
- ADJ: pinakataas
- Neg
- ADV: Wala
- PART: dili, wala, ayaw, di, Ambot
- VERB: wala
Verbal Features
- Imp
- VERB: moadto, ganahan, tabangan, Kinahanglan, Liguon, Silhigan, kasabot, Gigutom, Giuhaw, Limpyohan
- Perf
- VERB: gibuhat, Gitawag, gihigugma, Gi-hold, Gibilanggo, Gibutang, Gidungog, Gikinahanglan, Gikuha, Gimaotan
- Prosp
- VERB: masayod, Maka-iningles, Mamalit, Nakahukom, Nakalimtan, Nakamedalya, Nasuka, himo-an, mahimo, mahimuot
- Imp
- VERB: Ipalit, Padayona, Pag-ayo, Pagdali, Pakitaa, ipasa, kaguol, pagpataka, paliti, patoo
- Ind
- VERB: moadto, ganahan, gibuhat, tabangan, Gitawag, Kinahanglan, Liguon, Silhigan, gihigugma, kasabot
- Pot
- VERB: Maka-iningles, Nakabantay, Nakahukom, Nakalimtan, Nakamedalya, Nasuka, himo-an, mahimo, mainitan, makahinapon
- Act
- VERB: moadto, masayod, Mag-unsa, Magpabilin, Magtigom, Maka-iningles, Mamalit, Moabot, Mobalik, Mohulat
- Ifoc
- VERB: Ipalit, Pakitaa, ipasa
- Lfoc
- VERB: Limpyohan, himo-an, paliti
- Pass
- VERB: ganahan, gibuhat, tabangan, Gitawag, Kinahanglan, Liguon, Silhigan, gihigugma, kasabot, Gi-hold
Pronouns, Determiners, Quantifiers
- Dem
- ADV: Niay, Toa, Tua, nia, to
- PRON: kini, ana, kana, Kato, ani, ato, to
- Int
- ADV: Ngano, Asa, Pilay, Tagpila
- PRON: Unsa, Kinsa, Unsay
- Prs
- PRON: ko, siya, ka, ako, iya, imo, mi, nako, nimo, sila
- 1
- PRON: ko, ako, mi, nako, amo, koy, taka, tika, Akoy, kamiy
- 2
- PRON: ka, imo, nimo, imoha, ikaw, inyo, mo
- 3
- PRON: siya, iya, sila, ila, siyay, iyaha, niya
Other Features
- Clusivity
- Ex
- PRON: mi, amo, kamiy, miy
- In
- PRON: taka, tika, ta
- Ex
- Definitizer
- Yes
- ADJ: kadugaya, Samoka
- NOUN: lenggwahea
- Yes
- Deixis
- Med
- PRON: ana, kana
- Prox
- ADV: Ari, Dira, dinhi, Niay, nia
- PRON: kini, ni, ana, ani
- Remt
- ADV: didto, Tua, Toa, to
- PRON: kato, ato, to
- Med
- Foreign
- Yes
- NOUN: frog, orange, pet
- NUM: thirty
- Yes
- Neutral
- Yes
- ADV: Pilay, nay, Niay
- NUM: Duhay
- PART: may, pay
- PRON: Unsay, koy, siyay, Akoy, kamiy, miy
- VERB: Nagkitaay
- Yes
- PartType
- Int
- PART: ba
- Int
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (3)
- VERB--NOUN-ADP(ang) (26)
- VERB--PRON (3)
- VERB--PRON-Gen (26)
- VERB--PRON-Nom (58)
- obj
- VERB--NOUN (13)
- VERB--NOUN-ADP(sa) (20)
- VERB--NOUN-ADP(ug) (16)
- VERB--PRON-Gen (7)
- VERB--PRON-Nom (5)
Relations Overview
- This corpus uses 1 relation subtypes: compound:redup
- The following 12 relation types are not used in this corpus at all: iobj, xcomp, expl, dislocated, aux, cop, clf, list, orphan, goeswith, reparandum, dep