UD Karo TuDeT
Language: Karo (code: arr
)
Family: Tupian
This treebank has been part of Universal Dependencies since the UD v2.9 release.
The following people have contributed to making this treebank part of UD: Fabrício Ferraz Gerardi.
Repository: UD_Karo-TuDeT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Karo-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [fabricio • gerardi (æt) uni-tuebingen • de]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
UD_Karo-TuDeT is a collection of annotated sentences in Karo. The sentences stem from the only grammatical description of the language (Gabas, 1999) and from the sentences in the dictionary by the same author (Gabas, 2007). Sentence annotation and documentation by Fabrício Ferraz Gerardi.
UD_Karo-TuDeT is a collection of annotated sentences in Karo. The sentences stem from the only grammatical description of the language (Gabas, 1999) and from the sentences in the dictionary by the same author (Gavião and Gabas, 2007). It is part of TuLaR, Tupían Language Resources. The project is work-in-progress and the treebank is being updated on a regular basis. Sentence annotation and documentation by Fabrício Ferraz Gerardi.
Text sources
- Gabas Jr., Nilson (1999) A Grammar of Karo, Tupi (Brazil). University of California. Unpublished PhD thesis.
- Gavião, Sebastião and Gabas Jr., Nilson (2007) Dicionário Karo - Português. Privatly published.
Acknowledgments
The development of this treebank is supported by the by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 834050).
References
Statistics of UD Karo TuDeT
POS Tags
ADJ – ADP – ADV – AUX – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Clas – Clusivity – Corf – Decl – Evident – Int – Mood – Nomzr – Number – Person – Polarity – PronType – Redup – Reflex – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – amod – appos – aux – case – ccomp – clf – compound – conj – cop – dep – det – discourse – dislocated – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 674 sentences and 2319 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 32 types of words that contain both letters and punctuation. Examples: iʔke], kanãp], [ar, [at, [õn, nãt], (aʔkəy), [agóaʔpət, [amãn, [maʔpəy, [maʔwɨt, [mãygãra, [noorawa, [ocaʔyõk, [oken, [okera, [owepaka, [owəy, [péŋ, [towéya, [war, [yét, [yét], [ẽn, cú], otoy], owẽ], péŋ], ráwrem], toʔwa], wa], yat]
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: DET, CCONJ, INTJ, SYM
- This corpus contains 50 word types tagged as particles (PART): ahyə, aket, at, bap, co, coke, gán, gəy, iga, igã, iʔke, iʔke], iʔkõna, iʔkɨy, kanã, kokoãm, koãm, kán, kõam, manã, memã, menə, mã, mãm, nap, nyahmãm, nyar, nyat, nã, nãnin, pe, peʔ, pə, rab, rah, rap, rə, tah, tap, taykir, taykit, to, topə, tə, yahmãm, yar, yat, õn, ŋán, ʔaʔ
- This corpus contains 24 lemmas tagged as pronouns (PRON): at, aʔ, er, et, iʔat, iʔtə, iʔyat, karoat, kaʔto, kəy, kɨgomət, nãn, nõ, pagon, tabat, tap, toat, war, wat, yét, õn, ŋa, ŋaat, ẽn
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): kap, nã, waʔye, ʔe
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: ʔe
- There are 1 (de)verbal forms:
- Ger
- AUX: toʔwa, wa, ʔa, nã, a, eʔa, karoʔwa, roʔwa, toʔwa], wa]
- VERB: toba, wĩa, cára, okera, tokera, aʔwĩa, roba, taʔwara, tiga, ya
Nominal Features
- Plur
- ADJ: ʔaʔtoʔ, peroʔ, pɨʔtoʔ
- ADP: garokõna
- AUX: iʔkap, iʔkay, karoʔwa, reʔkay, ye
- AUX-Ger: karoʔwa
- NOUN: piro, cibekonnoʔ, inãwroʔ, kaʔaroʔ, káramnoʔ, naʔwəyroʔ, pɨk=toʔ, tuŋnoʔ, abagon, inakároʔ
- PART: to, iʔkõna
- PRON: kaʔto, iʔtə, iʔyat, tap, gaʔto, iʔat, iʔnõ, karoat, tabat, wat
- VERB: aʔwĩa, imaterãn, inaʔwara, iʔkəga, iʔpéya, karokéran, karokérara, karorocapét, karowéya, karowɨya
- VERB-Ger: aʔwĩa, inaʔwara, iʔkəga, iʔpéya, karokérara, karowéya, karowɨya, taptoba, tenaʔwara
- Sing
- ADJ: aʔpap, aʔkɨrɨk, aʔtarap, aʔwák, owicorop
- ADP: okõna, okəy, aʔkəy, ekəy, (aʔkəy), abihmãm, apik, omãmkəy, owihmãm, owikop
- AUX: wet, at, okay, aʔnãn, aʔkay, wep, aʔwaʔye, wa, ap, ʔa
- AUX-Ger: wa, ʔa, a, eʔa, wa]
- NOUN: opábe, oyãy, aʔwero, opá, owirup, acagá, aʔcet, aʔcey, aʔcot, aʔkun
- PART: õn
- PRON: õn, at, wat, ŋa, ẽn, et, war, ar, toat, [ar
- VERB: aʔtoy, aʔwĩn, otoy, aʔken, oken, okera, oyaʔwan, eken, ekerap, owakán
- VERB-Ger: okera, ekera, aʔpɨya, aʔwĩa, eyaʔwara, owɨya, oyaʔwara, [okera, [owepaka, aʔkera
- Abe
- ADP: owikop
- Abl
- ADP: ʔay
- All
- ADP: aʔpik, apik
- PRON: ŋapik
- Com
- ADP: pihmãm, bihmãm, mihmãm, abihmãm, owihmãm
- PART: rap, tap, nap, rab, rah, tah
- VERB: nakəga, ragahmõm, atati, aʔrati, nakõy, natia, orabitẽy, orakət, rabitẽy, raken
- VERB-Ger: nakəga, natia, rakəga, ŋaramãya
- Dat
- ADP: kəy, okəy, aʔkəy, ekəy, gəy, (aʔkəy), kokəy, omãmkəy, tomãmkəy
- PART: gəy
- PRON: ŋakəy, aʔkəy, ekəy
- Disp
- ADP: ʔerem
- Ine
- ADP: bət
- Ins
- ADP: mã
- PART: mã
- Loc
- ADP: peʔ
- Sim
- ADP: ŋõm, kõm
Degree and Polarity
- Neg
- PART: iʔke, taykit
Verbal Features
- Iter
- VERB: kãykãy, púŋpúŋ, cokcok, cutcut, tuytuy, uéué, wenwen, weriweri
- Opt
- VERB: abeʔmẽn, abeʔnan, abeʔnoy, abeʔŋət, beʔnogat, beʔŋət, meʔnẽy, meʔŋen, meʔŋət, oweʔŋen
- Fut
- AUX: okay, aʔkay, gay, ikap, kay, okap, ekab, ekap, ekay, iʔkap
- PART: yat, iga, nyat, yar, nyar
- Past
- PART: co
- RPast
- PART: gán, kán, ŋán
- Cau
- VERB: amaken, mawɨya, amabaraʔkət, amagahmõm, amapəri, amati, emaberopit, emakət, imaterãn, imatẽran
- VERB-Ger: mawɨya, macahmərəba, makɨra, mawɨga
- Pass
- VERB: abebeʔtɨra, abegahmõm, bekɨga, bemeŋãn, bewĩa, memaʔwaba, towewĩa
- VERB-Ger: abebeʔtɨra, bekɨga, bewĩa, memaʔwaba, towewĩa
- Fh
- PART: nãnin, menə, aket, topə, igã, iʔkɨy, coke, pə, manã, memã
- Nfh
- PART: tə, menə, coke, igã, rə
Pronouns, Determiners, Quantifiers
- Dem
- PRON: yét, [yét, [yét]
- Emp
- ADP: tokõna, okõna, garokõna, rokõna
- PRON: at
- Prs
- PART: at, õn
- PRON: õn, at, wat, ŋa, ẽn, toat, et, ar, war, [ar
- Yes
- ADP: omãmkəy, tomãmkəy
- VERB: omãmnoy, tomãmwĩn
- 0
- VERB-Ger: bekɨga, bewĩa
- 1
- ADJ: owicorop
- ADP: okõna, okəy, omãmkəy, owihmãm, owikop
- AUX: wet, okay, wep, wa, okap, we, web, wer, at, iʔkap
- AUX-Ger: wa, wa]
- NOUN: opábe, oyãy, opá, owirup, iʔca, iʔwirup, ocagəpto, ocagəptoʔ, ocorah, ocãp
- PART: iʔkõna, õn
- PRON: õn, wat, war, [õn, iʔtə, iʔyat, [war, iʔat, iʔnõ, owagon
- VERB: oken, otoy, okera, oyaʔwan, owakán, owɨya, oyaʔwara, [ocaʔyõk, [oken, [okera
- VERB-Ger: okera, owɨya, oyaʔwara, [okera, [owepaka, inaʔwara, iʔkəga, iʔpéya, o=kera, ocaʔyõga
- 2
- ADP: ekəy, garokõna
- AUX: ʔa, a, ap, ekab, ekap, ekay, ep, eʔa, karoʔwa, wet
- AUX-Ger: ʔa, a, eʔa, karoʔwa
- NOUN: ecáp, ekap, enaká, epábeʔ, ewirup
- PRON: ẽn, et, kaʔto, er, [ẽn, at, ekəy, gaʔto, karoat
- VERB: eken, ekerap, ekera, epɨy, ewét, eyaʔwara, amaken, ebeʔcɨn, ebiaʔan, ecapét
- VERB-Ger: ekera, eyaʔwara, ekɨga, ewɨya, karokérara, karowéya, karowɨya
- 3
- ADJ: aʔpap, aʔkɨrɨk, aʔtarap, aʔwák
- ADP: tokõna, aʔkəy, rokõna, (aʔkəy), abihmãm, apik, okõna, tomãmkəy
- AUX: toʔwa, at, aʔnãn, aʔkay, aʔwaʔye, ap, roʔwa, toʔwa], ŋaap, ŋanãn
- AUX-Ger: toʔwa, roʔwa, toʔwa]
- NOUN: aʔwero, abagon, acagá, aʔcet, aʔcey, aʔcot, aʔkun, aʔpábe, tanaká, tocit
- PRON: at, ŋa, toat, ar, [ar, [at, tap, ŋakəy, aʔkəy, tabat
- VERB: aʔtoy, aʔwĩn, aʔken, aʔwĩa, ayaʔwan, aʔtoba, aʔtop, tokera, abakán, aʔpɨya
- VERB-Ger: aʔwĩa, tokera, aʔpɨya, totia, [towéya, abebeʔtɨra, aʔkera, aʔkɨga, aʔpẽya, aʔtoba
- 3Imp
- AUX: ye, ikap, yet
- NOUN: ipá, ibeon, inãk, iyãy, icagá, icapop, icey, icáp, icãp, inakároʔ
- VERB: iket, itop, ikérat, iwĩ, ibaʔpat, ibetõ, icapé, icát, iyaʔwat, ibeʔtɨn
Other Features
- Clas
- Bds
- ADJ: maʔ
- Bss
- ADJ: gap, kap
- Ccv
- ADJ: káʔ, gáʔ, ká
- Cylb
- ADJ: bap, pap, map, pab, bab, bah
- Cylm
- ADJ: ʔɨp, ɨp
- Cyls
- ADJ: pɨʔ, bɨʔ, pɨʔtoʔ
- Fem
- AUX: ŋaap, ŋanãn, ŋaʔet
- PRON: ŋa, ŋakəy, ŋaat, ŋapik
- VERB: ŋabean, ŋaken, ŋaramãya, ŋaʔóa
- VERB-Ger: ŋaramãya, ŋaʔóa
- Flat
- ADJ: beʔ, meʔ, peʔ, peroʔ
- NOUN: be
- Rd
- ADJ: ʔa, ʔaʔ, aʔ, ʔaʔtoʔ
- Tflat
- ADJ: cɨʔ, beʔ, meʔ
- X
- ADJ: nãʔ
- Bds
- Clusivity
- Ex
- AUX: reʔkay
- VERB-Ger: tenaʔwara
- In
- AUX: iʔkap, ye
- NOUN: iʔca, iʔwirup
- PART: iʔkõna
- PRON: iʔtə, iʔat, iʔyat
- VERB: inaʔwara, iʔpéya, teʔyoy
- VERB-Ger: inaʔwara, iʔpéya
- Ex
- Corf
- Yes
- ADP: tokõna, rokõna, tomãmkəy
- AUX-Ger: toʔwa, roʔwa, toʔwa]
- NOUN: tocit, tomanẽ, towagon, towirap, towirup, toyãy
- PRON: toat
- VERB: tokera, totia, [towéya, tocaropaba, tocitóga, tokəga, tomãmwĩn, topaba, towaʔpara, towecɨra
- VERB-Ger: tokera, totia, [towéya, tocaropaba, tocitóga, tokəga, topaba, towaʔpara, towecɨra, towenaoba
- Yes
- Decl
- Assert
- AUX: ʔet, nãn, wet, at, okay, aʔnãn, aʔkay, gay, kay, wer
- VERB: wĩn, aʔtoy, ʔɨy, wɨy, aʔwĩn, ken, toy, otoy, yaʔwan, aʔken
- AssertFoc
- AUX: ʔep, wep, ap, okap, web, ekab, ekap, ep, ikap, iʔkap
- VERB: ekerap, wĩm, arap, aʔtop, kotigap, roy, top, yowarap, ʔop, aʔwĩm
- Assert
- Int
- Yes
- ADV: kõm
- AUX: nãn
- PART: ahyə
- PRON: kɨgomət, nãn
- Yes
- Nomzr
- Circ
- ADP: kokəy
- PART: kanã
- VERB: gotoy
- Circ
- Redup
- Yes
- NOUN: cácá
- VERB: kãykãy, púŋpúŋ, uéué, wenwen, cokcok, cutcut, tuytuy, weriweri
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: nã, ʔe.
- This corpus uses 4 lemmas as auxiliaries (aux). Examples: ʔe, kap, waʔye, nã.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (75)
- VERB--NOUN-ADP(tap) (3)
- VERB--NOUN-ADP(tap)-ADP(tap) (1)
- VERB--PRON (143)
- VERB-Ger--NOUN (22)
- VERB-Ger--PRON (6)
- obj
- VERB--NOUN (90)
- VERB--NOUN-ADP(kəy) (1)
- VERB--NOUN-ADP(peʔ) (2)
- VERB--PRON (2)
- VERB--PRON-ADP(tap) (1)
- VERB--PRON-Dat (2)
- VERB-Ger--NOUN (37)