UD Akuntsu TuDeT
Language: Akuntsu (code: aqz
)
Family: Tupian
This treebank has been part of Universal Dependencies since the UD v2.7 release.
The following people have contributed to making this treebank part of UD: Carolina Aragon, Fabrício Ferraz Gerardi.
Repository: UD_Akuntsu-TuDeT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Akuntsu-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [fabricio • gerardi (æt) uni-tuebingen • de]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
UD_Akuntsu-TuDeT is a collection of annotated sentences in Akuntsú. The sentences stem from the grammatical description by Aragon (2014) and Aragon’s field work. Sentence annotation and documentation by Carolina Aragon, Fabrício Ferraz Gerardi, Luana dos Santos.
UD_Akuntsu-TuDeT is a collection of annotated sentences in Akuntsú. The sentences stem from the grammatical description by Aragon (2014) and Aragon’s field work. It is part of TuLaR, Tupían Language Resources. The project is work-in-progress and the treebank is being updated on a regular basis. Sentence annotation and documentation by Carolina Aragon, Fabrício Ferraz Gerardi, Luana dos Santos.
Text sources
- Aragon, Carolina (2018) *Variações estilísticas e sociais no discurso dos falantes Akuntsú*. Revista Polifonia, v. 25, 90-103.
- Aragon, Carolina (2018) *Posposições e marcadores oblíquos em Akuntsú (Tupí)*. Revista Brasileira de Linguística Antropológica, v. 10, 47-57.
- Aragon, Carolina (2015) Considerações sobre os ideofones e seu uso em Akuntsú. Revista de Letras (Taguatinga), v. 8, 1-13.
- Aragon, Carolina (2014) *A Grammar of Akuntsú, a Tupian language*. PhD dissertation, University of Hawaii, unpublished PhD dissertation.
- Aragon, Carolina (2008) *Fonologia e aspectos morfológicos e sintáticos da língua Akuntsú*. Master thesis, Universidade de Brasília, unpublished master thesis.
Acknowledgments
The development of this treebank is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 834050).
References
Statistics of UD Akuntsu TuDeT
POS Tags
ADJ – ADP – ADV – AUX – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – VERB
Features
Aspect – Case – Clusivity – Deixis – Determ – Foc – Mood – Nomzr – Number – NumType – Obl – Person – Person[psor] – Person[subj] – Polarity – PronType – Redup – Reflex – Rel – Tense – Trans – Tv – Voice
Relations
advcl – advmod – amod – appos – aux – case – ccomp – conj – dep – det – discourse – dislocated – iobj – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 343 sentences, 1449 tokens and 1468 syntactic words.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus contains 19 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 15 types of multi-word tokens. Examples: ino, kɨrom, menerom, apiteperom, ataperom, ekerom, etom, iteterom, itʃoberom, jãjerom, kitʃetom, korom, mepiterom, taɨperom, tʃetom.
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, VERB
- This corpus does not use the following tags: SCONJ, CCONJ, SYM, X
- This corpus contains 10 word types tagged as particles (PART): a, ana, ekwa, kom, mã, ne, pe, te, tea, ãka
- This corpus contains 13 lemmas tagged as pronouns (PRON): arop, bõ, e, en, erẽ, i, kitʃe, no, on, orẽ, tara, te, ẽrom
- This corpus contains 9 lemmas tagged as determiners (DET): atʃo, eme, jõ, jẽ, jẽrom, ke, ta, ẽ, ẽrom
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: ẽrom
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): am, jã, ka, ko, piro, toa, tʃe
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: am, jã, ka, ko, tʃe
- This corpus does not use the VerbForm feature.
Nominal Features
- Plur
- VERB: kitʃet
- Sing
- NOUN: ikɨp
- PRON: en, on, erẽ, erẽbõ, orẽ, te, ebõ, enõ
- VERB: imaã, oewɨbɨka, oirika, ojã
- Abl
- NOUN: atʃiri, kɨrẽri, piri, tawtʃeri
- PRON: aroperi, ẽromri
- All
- DET: jẽbõ, kebõ
- NOUN: tabɨtõ, kɨrẽbõ, ɨkɨbõ, ekõ, kirẽbõ, kojõpebõ, kojõpibõ, pabapebõ, pibõ, tekõ
- PRON: erẽbõ, enõ, tebõ
- PROPN: Kanibõ
- Dat
- DET: kebõ
- PRON: orẽbõ, ebõ
- Loc
- ADP: etʃe
- NOUN: ɨkɨpe, eanampe
- Tra
- NOUN: kiakopna, kwena, menna, nakona, pitoana, takɨrapna, tatona, tawpɨkna, tawtʃena, emenna
- PART: ana
Degree and Polarity
- Neg
- ADV: nom, nõm, erom, rom, om
Verbal Features
- Hab
- VERB: oetara, koara, mira, etʃetara, kietara, kitʃetara, oamõjara, teipara
- Iter
- VERB: ikiramkwatekwa
- Ind
- VERB: kietara, kitʃetara
- Fut
- PART: kom
- Cau
- VERB: mõatʃoa
Pronouns, Determiners, Quantifiers
- Emp
- PRON: erẽ, orẽ
- Ind
- PRON: no
- Prs
- PRON: en, on, orẽbõ, erẽbõ, kitʃe, te, enõ
- Card
- NUM: tɨrɨ, kɨte, tɨɾɨ, tɨrɨtɨrɨtɨrɨ
- Yes
- AUX: tejã
- NOUN: pe, jen, po, teten, epo, opo, teatap, teimaj, teimi, teip
- PRON: tebõ
- VERB: teita, teeta, teimaj, teipara, tekwata, teakata, teaota, teera, teipa, tejã
- 1
- AUX: ojã, otoa, kitoa
- NOUN: omepit, oike, otʃipap, oatap, oko, okɨp, okɨpi, otak, opo, itet
- PRON: on, orẽbõ, kitʃe, orẽ
- VERB: oerekwa, oetara, oamõja, opera, opip, otʃeta, otʃoa, itet, kietara, kipera
- 2
- AUX: ejã, eko, etoa
- NOUN: epo, eape, eboro, ekem, ekoro, epi, eti, eanampe, eiat, emenna
- PRON: en, erẽbõ, erẽ, on, ebõ, enõ
- VERB: koara, eeta, eneme, epekã, eata, eerekkwa, eimi, eipa, etʃera, etʃetara
- 3
- AUX: iko, iam, tejã, tejãkwa
- NOUN: ikɨp, iatap, iiw, imen, imepit, iten, itoap, itʃobe, itʃoke, tajtʃi
- PRON: i, te, tebõ
- VERB: teita, ikora, iat, ikoa, taot, teeta, iata, iekɨj, ijã, ikɨta
Other Features
- Clusivity
- In
- AUX: kitoa
- PRON: kitʃe
- VERB: kietara, kipera, kitʃet, kitʃetara
- In
- Deixis
- Dist
- DET: ke, jẽrom, ta, tarom, ẽrom, kebõ
- Prox
- DET: jẽ, ẽ, eme, kebõ, jẽbõ
- Dist
- Determ
- Yes
- NOUN: eot
- Yes
- Foc
- Yes
- PART: ne
- Yes
- Nomzr
- Circ
- NOUN: atʃoap, nĩap, parãap, tʃogaap, pajapna
- Obj
- NOUN: imi, iõ, imokwa, itʃopa, oiko, eiat, oiat, teimaj, teimi
- VERB: iko, eimi, oiko
- Circ
- Obl
- Yes
- NOUN: atitipe, kɨppe, pagoppe, oikepe
- Yes
- Person[psor]
- 1
- NOUN: oiat
- 2
- NOUN: epi
- 3
- NOUN: teten
- 1
- Person[subj]
- 2
- NOUN: epi
- 2
- Redup
- Yes
- NOUN: kapakapa
- NUM: tɨrɨtɨrɨtɨrɨ
- VERB: kõjkõjkõj, nininia
- Yes
- Rel
- Cont
- NOUN: tek, tep, tet, tanam, takɨma, tokwaj, otek, tekõ, tepna, tetna
- VERB: itet, itetkwa
- Cont
- Trans
- Yes
- NOUN: pɨtka, ipitka, iɨka
- VERB: erekka, jãjka, pɨtka, tʃãka, oerekwa, amkwa, apeka, atabaka, buhka, erekkwa
- Yes
- Tv
- Yes
- VERB: ata, koa, tʃopa, mia, nia, atʃoa, ikoa, koara, oetara, õa
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
- This corpus uses 7 lemmas as auxiliaries (aux). Examples: ko, tʃe, jã, ka, toa, am, piro.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (52)
- VERB--PRON (51)
- VERB--PRON-All (4)
- VERB--PRON-Dat (2)
- obj
- VERB--NOUN (122)
- VERB--NOUN-ADP(pabape) (1)
- VERB--NOUN-All (2)
- VERB--PRON (1)
Verbs with Reflexive Core Objects
- This corpus contains 5 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: at tekɨjt, ka epo, poro jen, tʃoga opo, õkwa po