UD Cappadocian AMGiC
Language: Cappadocian (code: cpg
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Konstantinos Sampanis, Prokopis Prokopidis, Furkan Akkurt.
Repository: UD_Cappadocian-AMGiC
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news
Questions, comments? General annotation questions (either Cappadocian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [konstantinos • sampanis (æt) yahoo • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
The “Asia Minor Greek in Contact” treebank (AMGiC, UD_AMGiC) is compiled from sentences entailing contact-induced morphosyntactic phenomena (CIMSP) that are a result of the contact between Greek and Turkish varieties in Anatolia and in adjacent regions. The sentences are traced in Asia Minor Greek (AMG) dialectal sources. In addition to the UD analysis, the AMGiC treebank provides information concerning the sociolinguistic context within which CIMSP arise.
AMGiC is a UD treebank dealing with cases of Contact-Induced Morphosyntactic Phenomena (CIMSP) in Inner Asia Minor Greek (AMG) that emerged under the influence of Turkish. Inner AMG comprises several interrelated but clearly distinct Cappadocian subdialects as well as the varieties of Silliot and Pharasiot (cf. Manolessou 2019; Cappadocian Greek (CG), Silliot and Pharasiot are in fact classified as distinct dialects, cf. Janse 2020: 203). Given however that the ISO 639-3 code we utilize for AMGiC is cpg, i.e. “Cappadocian Greek”, we employ CG as a pars pro toto designation for all Inner AMG varieties.
Apart from the annotation, AMGiC offers a detailed metadata section, in which CIMSP are tagged (cf. Sampanis & Prokopidis 2021). The current version (as of v2.15) of AMGiC is the first batch of the treebank including CIMSP traced in Silliot. Future versions of AMGiC will include CG and Pharasiot as well.
Acknowledgments
…
Statistics of UD Cappadocian AMGiC
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Clitic – Definite – Gender – Mood – Number – NumType – PartType – Person – Polarity – Poss – PronType – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – advmod:emph – amod – appos – aux – aux:q – case – cc – ccomp – conj – cop – csubj – det – expl – iobj – mark – nmod – nsubj – nummod – obj – obl – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 36 sentences, 450 tokens and 451 syntactic words.
- This corpus contains 83 tokens (18%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 15 types of words that contain both letters and punctuation. Examples: 'ne, apés', m', s', t'emélia, 'ni, Ksevasám', dilimléisam', kiriós', op', put', yüsártsisam', és'kam', és'kin, ípsam'
- This corpus contains 1 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 1 types of multi-word tokens. Examples: stu.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: INTJ, SYM
- This corpus contains 3 word types tagged as particles (PART): mi, re, ren
- This corpus contains 9 lemmas tagged as pronouns (PRON): (e)tútus, (e)γó, _, kínus, ne, ra, ro, su, táre
- This corpus contains 8 lemmas tagged as determiners (DET): (o), (ο), tiás, tu, téna, xer, énas, ís
- This corpus contains 4 lemmas tagged as auxiliaries (AUX): mi, na, se, ímu
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: ímu
- There are 2 (de)verbal forms:
- Fin
- AUX: 'ne, ne, éni, 'ni, ísu
- VERB: laí, qazánǰisi, éršiti, éxu, írtis, Ksevasám', Rotá, baγərtzísi, báris, dilimléisam'
- Part
- VERB: kimizméni
Nominal Features
- Fem
- ADJ: kalí, meγáli, yerasméni
- DET: čin, či, tes
- NOUN: kóri, enéka, mána, enékan, góri, iméra, iréan, klišás, ksíla, líres
- NUM: Tris, triz
- PRON: či, čis, ǰis, ǰi
- VERB-Part: kimizméni
- Masc
- ADJ: A, fikirsúzis
- DET: tu, tus
- NOUN: pará, staxtiǰís, vaván, Aγás, Mándis, Qujumǰís, Vavás, gjavúriri, kefálin, kukuniós
- PRON: du, tútus, tu, kínus, su, tus, tútunu
- PROPN: Yóryis
- Neut
- ADJ: polá, bašká, kaló, xošá, úla
- DET: ta, tu, t, éna, da, tiyá, téna, čin
- NOUN: pará, psomí, t'emélia, Psémata, alísia, kalaǰí, korítsi, küréi, limóri, ombrín
- PRON: da, ta, dha, Τúta
- Plur
- ADJ: polá, úla
- DET: ta, tes, tus, čin
- NOUN: pará, Psémata, alísia, gjavúriri, ksíla, líres, méres, rúxa
- NUM: Tris, triz
- PRON: mas, tun, tus, Τúta
- VERB-Fin: Ksevasám', dilimléisam', ipúmi, kasinonǰískaši, yüsártsisam', és'kam', ípsam'
- Sing
- ADJ: ko, A, bašká, fikirsúzis, kalí, kaló, meγáli, xošá, yerasméni
- AUX-Fin: 'ne, ne, éni, 'ni, ísu
- DET: tu, čin, t, éna, či, da, ta, tiyá, téna
- NOUN: kóri, pará, psomí, enéka, mána, staxtiǰís, t'emélia, vaván, Aγás, Mándis
- PRON: du, su, tu, da, mu, či, ta, tútus, čis, s'
- PROPN: Yóryis
- VERB-Fin: laí, qazánǰisi, éršiti, éxu, írtis, Rotá, baγərtzísi, báris, düšünǰísu, eleísis
- VERB-Part: kimizméni
- Acc
- ADJ: polá, bašká, úla
- DET: tu, ta, čin, t, či, da, tes, tiyá, tus, téna
- NOUN: pará, psomí, vaván, Psémata, alísia, enéka, enékan, gjavúriri, góri, iméra
- NUM: Tris, triz
- PRON: da, ta, či, dha, m, s', su, séna, tun, tus
- VERB-Part: kimizméni
- Gen
- DET: tu
- NOUN: klišás, zuliás
- PRON: du, tu, mu, su, čis, mas, ǰis, s', tútunu, či
- Nom
- ADJ: kalí, kaló, xošá, yerasméni
- DET: éna
- NOUN: kóri, staxtiǰís, t'emélia, Aγás, Mándis, Qujumǰís, Vavás, enéka, korítsi, kukuniós
- PRON: tútus, ši, kínus, su, či, Τúta, γo
- Voc
- ADJ: A, fikirsúzis
- NOUN: mána, ádras, ǰaním
- PROPN: Yóryis
- Def
- DET: tu, ta, čin, t, či, da, tes, tus
- Ind
- DET: éna, téna
Degree and Polarity
- Neg
- PART: re
Verbal Features
- Imp
- AUX-Fin: éni, 'ne, 'ni, ne
- VERB-Fin: laí, éxu, Rotá, eršinónǰiska, eršístiniz, filáto, kasinonǰískaši, kimáti, kupaná, léi
- Perf
- VERB-Fin: qazánǰisi, írtis, Ksevasám', baγərtzísi, báris, dilimléisam', düšünǰísu, eleísis, forósu, fáγu
- VERB-Part: kimizméni
- Imp
- VERB-Fin: pe, skáma, ápar
- Ind
- AUX-Fin: 'ne, ne, éni, 'ni, ísu
- VERB-Fin: laí, qazánǰisi, éršiti, éxu, írtis, Ksevasám', Rotá, dilimléisam', eleísis, eršinónǰiska
- Sub
- VERB-Fin: baγərtzísi, báris, düšünǰísu, forósu, fáγu, galatzépši, ipúmi, kiriós', pis, píši
- Fut
- VERB-Fin: eleísis, pári, páru, vlépis
- Past
- VERB-Fin: qazánǰisi, írtis, Ksevasám', dilimléisam', eršinónǰiska, kasinonǰískaši, píki, skótisa, skótisi, xásis
- Pres
- AUX-Fin: 'ne, ne, éni, 'ni, ísu
- VERB-Fin: laí, éršiti, éxu, Rotá, düšünǰísu, eršístiniz, filáto, forósu, fáγu, ipúmi
- Act
- VERB-Fin: laí, qazánǰisi, éxu, Ksevasám', Rotá, baγərtzísi, báris, dilimléisam', düšünǰísu, eleísis
- Pass
- AUX-Fin: 'ne, 'ni, éni
- VERB-Fin: éršiti, írtis, eršinónǰiska, eršístiniz, kasinonǰískaši, kimáti, zirmónisin
- VERB-Part: kimizméni
Pronouns, Determiners, Quantifiers
- Art
- DET: tu, ta, čin, t, éna, či, da, tes, tus, téna
- Dem
- PRON: tútus, kínus, tútunu, Τúta
- Ind
- PRON: táre
- Prs
- PRON: du, su, tu, da, mu, či, ta, čis, mas, s'
- Card
- NUM: Tris, triz
- Yes
- PRON: du, mu, su, čis, mas, tu, ǰis
- 1
- PRON: mu, mas, m, ši, γo
- VERB-Fin: éxu, Ksevasám', dilimléisam', düšünǰísu, filáto, forósu, fáγu, ipúmi, páru, skótisa
- 2
- AUX-Fin: ísu
- PRON: su, s', séna, ši
- VERB-Fin: báris, eleísis, pe, pis, skáma, sorí, séliz, tránis, vlépis, xásis
- 3
- AUX-Fin: 'ne, ne, éni, 'ni
- PRON: du, tu, da, či, ta, tútus, čis, ǰis, dha, kínus
- VERB-Fin: laí, qazánǰisi, éršiti, Rotá, baγərtzísi, eršinónǰiska, eršístiniz, galatzépši, kasinonǰískaši, kimáti
Other Features
- Clitic
- Yes
- PRON: du, tu, da, mu, su, či, ta, čis, dha, m
- Yes
- PartType
- Neg
- PART: re
- Neg
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: ímu.
- This corpus uses 2 lemmas as auxiliaries (aux). Examples: na, se.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN-Nom (5)
- VERB-Fin--NOUN-Nom (12)
- VERB-Fin--PRON-Nom (7)
- obj
- VERB--NOUN (1)
- VERB--NOUN-Acc (3)
- VERB--NOUN-Nom (1)
- VERB--PRON (1)
- VERB--PRON-Acc (1)
- VERB-Fin--NOUN (1)
- VERB-Fin--NOUN-Acc (23)
- VERB-Fin--PRON-Acc (12)
- VERB-Fin--PRON-Gen (1)
- iobj
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Acc-ADP(s(e)) (1)
- VERB-Fin--PRON-Acc (1)
- VERB-Fin--PRON-Gen (3)
Relations Overview
- This corpus uses 2 relation subtypes: advmod:emph, aux:q
- The following 11 relation types are not used in this corpus at all: dislocated, discourse, clf, fixed, flat, compound, list, orphan, goeswith, reparandum, dep