UD Gheg GPS
Language: Gheg (code: aln
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.11 release.
The following people have contributed to making this treebank part of UD: Christian Ebert, Artan Islamaj, Adrian Kuqi, Barbara Sonnenhauser, Paul Widmer, Magdalena Plamada.
Repository: UD_Gheg-GPS
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Gheg-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [christiangeorg • ebert (æt) uzh • ch, barbara • sonnenhauser (æt) uzh • ch, paul • widmer (æt) uzh • ch]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually |
UPOS | annotated manually, natively in UD style |
XPOS | not available |
Features | annotated manually, natively in UD style |
Relations | annotated manually, natively in UD style |
Description
UD Gheg Pear Stories (GPS) contains renarrations of Wallace Chafe’s Pear Stories video (pearstories.org) by heritage speakers of Gheg Albanian living in Switzerland and speakers from Prishtina.
UD Gheg GPS contains 966 sentences from 64 recordings of Gheg speakers re-narrating the Pear Stories video. Data collection was part of a bigger project that took place from May 2019 to July 2022 in Zurich, Prishtina and Munich. Only recordings from Prishtina und Zurich were included in the treebank. Speakers of three different generations were interviewed, age ranging from 10 to 67. Sentence ids contain information on location (P
for Prishtina, Z
for Zurich), Generation (G1
, G2
, G3
) and a unique speaker id, all separated by hyphens, followed by an underscore and the sentence id, which starts at 1 for each interview. Due to the multilingual setting, the treebank contains many instances of code-switching (mostly Swiss-German). It also exhibits characteristics of (semi-)spontaneous speech, like disfluencies and corrections.
The treebank contains 16k tokens and was not split into training and test set.
Acknowledgments
- Artan Islamaj, Adrian Kuqi: Annotation
- Christian Ebert: Treebank construction, validation and annotation supervision
- Barbara Sonnenhauser, Paul Widmer: Project supervision
- Magdalena Plamada: Technical support
The project was funded by the SNSF grant No. 100015L_182126/1.
References
- (citation)
Statistics of UD Gheg GPS
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PUNCT – SCONJ – VERB – X
Features
Case – Definite – Degree – Foreign – Gender – Mood – Number – NumType – Person – Polarity – PronType – Reflex – Tense – VerbForm – Voice
Relations
acl – advcl – advmod – amod – appos – aux – aux:part – case – cc – ccomp – conj – cop – csubj – dep – det – discourse – dislocated – expl – fixed – flat – iobj – list – mark – nmod – nsubj – nsubj:outer – nummod – obj – obl – orphan – parataxis – punct – reparandum – root – xcomp
Tokenization and Word Segmentation
- This corpus contains 966 sentences and 15990 tokens.
- This corpus contains 2 tokens (0%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 962 types of words that contain both letters and punctuation. Examples: edhe:, da:rdha, ë:hm, da:rdhat, ni:, dardha:, pa:, bicikle:t, dardha:t, rru:gës, fmi:, tani:, ëh:, da:rdh, tri:, dhe:, hy:p, i:, ra:, me:, shpo:rt, aty:, ata:, ka:, u:, ë:h, ai:, rru:gën, ato:, ka:n, to:k, ëdhe:, a:, dhi:, kalo:jn, mle:dh, shku:, dja:l, dja:lit, e:, kape:lën, që:, tje:r, ë:, dja:li, kape:la, ma:rr, njeri:, qe:, tho:n
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: PROPN, SYM
- This corpus contains 44 word types tagged as particles (PART): /m, a, ani, as, de, do, do:, faleminderit, falemnderit, far, farë, jo, jo:, m, m:, ma, ma:, maj, me, me/, me:, mi, mos, mu, më, nu:k, nuk, oke, oke:, okej, p, pe, po, po:, s, t, te, thanks, ti, ti:, tu, ty, të, të:
- This corpus contains 178 lemmas tagged as pronouns (PRON): _, a, ai, aj, ajo, ajë, anjo, asajna, asgjith, asgjo, asgjë, asi, askush, asni, asnjë, asxhë, at, ata, ataa, atan, ati, atie, atin, atina, atine, atinve, ato, atu, aty, atyna, atyne, atynve, atyve, atë, au, av, cili, cilin, cilla, cillat, cilli, cillt, cilët, dikon, dikun, dikush, dikë, diqka, disa, dishka, diçka, do, donjë, e, em, fare, githa, gjith, gjithshka, gjithë, i, ifar, ijit, j, ja, ju, ka, kejt, kerkush, ket, ki, kja, kjo, kof, krejt, krejte, ksaj, ksi, kta, kti, ktij, ktija, ktina, ktinve, kto, ktyne, ktynve, ktë, ku, kurfar, kurgjo, kurkush, kurr, kurxhi, kurxho, kush, ky, kyve, kyy, kët, këta, me, mu, më, naj, nata, ne, njana, njanen, njani, njeni, njenën, njona, njonen, njoni, njonën, një, njëri, qa, qaj, qasaj, qasishne, qat, qata, qato, qe, qekjo, qet, qfar, qi, qiket, qisi, qita, qito, qka, qysh, që, qësaj, sajna, se, secili, secilit, secillin, sen, sha, shka, si, sicili, sicilin, sicilli, sicillit, ti, tina, tjer, tjert, tonve, tyne, tynve, tyre, tyrë, të, u, um, un, une, unë, v, vet, vetat, vete, veti, vetë, vët, çat, çato, çka, çysh, ë
- This corpus contains 6 lemmas tagged as determiners (DET): a, e, i, një, së, të
- Out of the above, 5 lemmas occurred sometimes as PRON and sometimes as DET: a, e, i, një, të
- This corpus contains 8 lemmas tagged as auxiliaries (AUX): _, do, duke, jam, kam, po, tu, u
- Out of the above, 5 lemmas occurred sometimes as AUX and sometimes as VERB: _, duke, jam, kam, po
- There are 3 (de)verbal forms:
- Fin
- AUX: ka, kan, u, osht, ishte, jan, kishte, o, jon, ke
- VERB: do, kalon, sheh, merr, pa, dha, di, erdh, vjen, kaloj
- Inf
- VERB: thon, mledh, tho:n, marr, mar, than, mle:dh, kqyr, qu, shku
- Part
- AUX: kon, ko:n
- VERB: mledh, pa, shku, ardh, kon, mar, pa:, vjedh, kan, ra
Nominal Features
- Fem
- ADJ: tjeter, mbu:shura, vogël, mush, njejtën, re:, vogel, bardh, bishtale:ca, bu:kur
- DET: e, t, të
- NOUN: da:rdha, dardha, dardhat, da:rdhat, korp, dardh, bicikell, dardha:, tok, dardha:t
- NUM: tri, tri:, dy:, treta, tretën, tretës
- PRON: ato, at, ajo, ato:, kjo, asaj, kto, gjitha, njanen, a:t
- Masc
- ADJ: tjer, tje:r, tjeter, tjetër, vogël, tjetri, vjeter, vogel, ri, sjellshëm
- DET: i, e, t, të, /i, i:, te, të:, ë
- NOUN: djem, djali, djal, djemt, njeri, kapuqin, burr, bujk, burri, fmi:
- NUM: tre, tre:, treve, tret, /tre, tra
- PRON: aj, ata, ky, kta, at, ati, ai, ata:, ai:, ki
- Plur
- ADJ: tjer, tje:r, mbu:shura, sjellshëm, bishtale:ca, gabu:ara, grabitur, hu:ja, lidht, mbushun
- AUX-Fin: kan, jan, jon, ishin, ka:n, kishin, ju:n, jun, ka, ken
- DET: t, e, të, i, te, i/, i:, ë
- NOUN: da:rdha, dardha, dardhat, da:rdhat, djem, djemt, dardha:, dardha:t, fmi:, dardhët
- NUM: treve, tret
- PRON: i, ata, ato, kta, u, ata:, ato:, i:, atyne, j
- VERB-Fin: kalojn, kalo:jn, erdhen, shkun, ndimojn, ndihmo:jn, ndimun, pa:n, kalun, kan
- Sing
- ADJ: tjeter, tjetër, vogël, vogel, mbu:shur, ri, tje:r, vjeter, majt, njejtën
- AUX-Fin: ka, u, osht, ishte, kishte, o, ke, kish, ka:, kum
- DET: e, i, t, të, e:, së, të:, /i, e/, i:
- NOUN: djali, korp, djal, dardh, bicikell, njeri, tok, bicikle:t, kapuqin, rru:gës
- NUM: tretën, tretës
- PRON: e, aj, i, j, a, at, ky, m, ajo, ati
- VERB-Fin: do, kalon, merr, pa, sheh, dha, di, erdh, vjen, kaloj
- Abl
- ADJ: tjeter, kaotike, tjetres
- DET: së
- NOUN: rru:gës, lisit, rruges, biciklete, biciklles, dardhës, pemes, rrugës, bicikletes, da:rdhes
- PRON: ati, ty:re, asa:jna, asajna, atina, aty:nve, atyne, atynve, kësa:j, qasaj
- Acc
- ADJ: tjeter, tje:r, vogël, gjermanisht, mbu:shura, njejtën, tje:tër, tjetër, bishtale:ca, buku:r
- DET: e, t, të, e:, te, të:, e/, i, i/, i:
- NOUN: da:rdha, dardha, dardhat, da:rdhat, bicikell, dardh, korp, tok, dardha:, bicikle:t
- NUM: tretën
- PRON: e, i, a, at, ato, ata, u, a:, i:, ato:
- Dat
- ADJ: elementa:re, shku:rt, tjer, tjetrin, tjetrit
- DET: t, i, të, e, së
- NOUN: djalit, dja:lit, djemve, bicikles, da:rdhave, filmit, gurit, moshës, biciklles, biqikletës
- PRON: i, j, m, ati, e, ati:, kti, atyne, atina, i:
- Gen
- DET: e, t, i, ë
- NOUN: dardhes, kohës, rru:gës, da:rdhave, dardha:ve, dardhës, dja:lit, kohes:, naty:rës, shpo:rteve
- PRON: tij, ksi, ti, ti:j, veten, ijit, tina, tina:, ty:re, tyne
- Nom
- ADJ: tjer, tjeter, tjetër, tje:r, vogël, vogel, ri, vjeter, anonym, mbu:shur
- DET: i, e, t, të, /i, e:, i:, te
- NOUN: djem, djali, djal, djemt, njeri, fmi:, burr, bujk, burri, da:rdhat
- PRON: aj, ata, ky, kta, ajo, ato, ai, ai:, ata:, ki
- Def
- ADJ: tjetri
- NOUN: dardhat, da:rdhat, djali, djemt, dardha:t, kapuqin, rru:gës, burri, kapelen, rru:gën
- Ind
- ADJ: tjeter, tje:trën, tjera:, tjetri, tjetrin, tjetrit, tjetër
- DET: një
- NOUN: da:rdha, dardha, djem, korp, djal, dardh, dardha:, bicikell, tok, bicikle:t
Degree and Polarity
- Cmp
- ADJ: herët, larti, ngat, shum, vo:gël
- ADV: von
- Sup
- ADJ: pakti
- Neg
- PART: nuk, s, mos, nu:k
Verbal Features
- Adm
- AUX-Fin: paska
- Ind
- AUX-Fin: ka, kan, u, osht, ishte, jan, kishte, o, jon, ke
- VERB-Fin: do, kalon, sheh, merr, pa, dha, di, erdh, vjen, kaloj
- Sub
- AUX-Fin: je:n, ki:sha:, kën
- VERB-Fin: shfrytzoj, akuzoj, binte, bëntë, fshihnin, japish, je:t, jep, jet, ket
- Imp
- AUX-Fin: ishte, kishte, kish, ishin, kishin, ish, ishte:, kishe, i:shte, ishtë
- VERB-Fin: ishte, kishte, mungonte, mushte, dinin, hanin, kalo:nin, kalonte, kishin, kishtë
- Past
- AUX-Fin: ke, ka, ken, ka:, ke:n, pat, pata
- AUX-Part: kon
- VERB-Fin: pa, dha, erdh, kaloj, shkoj, pash, mur, ra, erdhen, pa:
- VERB-Part: kon
- Pres
- AUX-Fin: ka, kan, u, osht, jan, o, jon, ka:n, kum, ka:
- VERB-Fin: do, kalon, sheh, merr, di, vjen, jep, osht, she, bjen
- Act
- AUX-Fin: ka, kan, u, osht, ishte, jan, kishte, o, jon, ke
- VERB-Fin: do, kalon, sheh, merr, pa, dha, di, vjen, erdh, kaloj
- Pass
- VERB-Fin: shihet, shihen, shifet, thohet, vilen
Pronouns, Determiners, Quantifiers
- Dem
- PRON: aj, at, ata, ky, ato, kta, ati, ato:, ajo, kjo
- Int
- PRON: qka, qysh, kush, qa, qfar, ku, qka:, shka, qfa:r, si
- Neg
- PRON: asni
- Prs
- PRON: aj, ata, vet, ai, ai:, ajo, veti, ata:, ti, tij
- Rel
- PRON: qe, që, qi, cili, që:, qe:, qysh, /sha, cili:, cilli
- Card
- NUM: tre, tri, ni, dy, tri:, një, nja, njo, nji, dy:
- Ord
- NUM: tret, treve, dhe:ta, dyt, dyten, nodhe:ta, pa:r, pa:ri, tre:t, treten
- Yes
- AUX: u, u:, o
- PRON: vet, veti, tij, ti:j, ve:te, ti, ve:t, ve:ti, atina, mu
- 1
- AUX-Fin: kum, kom, jam, ki:sha:, ko:m, ku:m, pata
- PRON: m, une, mu, un, une:, um, u:ne, un:, ai, i
- VERB-Fin: di, pash, besoj, di:, pa:sh, beso:j, muj, doket, kupto:va, kuptoj
- 2
- AUX-Fin: ishte, ka, keni, ki
- PRON: ti, aj, ja, t
- VERB-Fin: din, di, pa, ishe, japish, kalum, len, merre, mungon, the
- 3
- AUX-Fin: ka, kan, u, osht, ishte, jan, kishte, o, jon, ke
- PRON: i, e, j, a, aj, ata, u, i:, ai, ai:
- VERB-Fin: do, kalon, sheh, dha, vjen, erdh, kaloj, merr, pa, jep
Other Features
- Foreign
- Yes
- ADP: am, in, mit, ufm, vo, vom
- ADV: ebe, fertig, lot, wi:ter
- AUX: hend, isch
- CCONJ: a, und
- DET: a
- NOUN: birne, kappe, baum, belo:nig, bode, jungs, korbe, lichtung, ma, menschlikeit
- NUM: zwei
- PART: thanks
- PRON: ihm, er, sich, sini
- VERB: abecheit, gholfe, helfe:, inegfahre, pfeife, verteilt
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: jam.
- This corpus uses 3 lemmas as auxiliaries (aux). Examples: kam, jam, _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (1)
- VERB--NOUN-Nom (1)
- VERB--PRON (1)
- VERB-Fin--NOUN (3)
- VERB-Fin--NOUN-Acc (1)
- VERB-Fin--NOUN-Dat (1)
- VERB-Fin--NOUN-Nom (297)
- VERB-Fin--NOUN-Nom-ADP(me) (2)
- VERB-Fin--PRON (79)
- VERB-Fin--PRON-Acc (1)
- VERB-Fin--PRON-Dat (13)
- VERB-Fin--PRON-Nom (253)
- VERB-Inf--NOUN-Nom (1)
- VERB-Inf--PRON-Nom (1)
- VERB-Part--NOUN (2)
- VERB-Part--NOUN-Acc (10)
- VERB-Part--NOUN-Nom (144)
- VERB-Part--NOUN-Nom-ADP(te)-ADP(te) (1)
- VERB-Part--PRON (48)
- VERB-Part--PRON-Dat (2)
- VERB-Part--PRON-Nom (78)
- obj
- VERB--PRON (2)
- VERB-Fin--NOUN-Abl (5)
- VERB-Fin--NOUN-Acc (369)
- VERB-Fin--NOUN-Dat (17)
- VERB-Fin--NOUN-Gen (1)
- VERB-Fin--NOUN-Nom (86)
- VERB-Fin--NOUN-Nom-ADP(nga) (1)
- VERB-Fin--PRON (45)
- VERB-Fin--PRON-Acc (242)
- VERB-Fin--PRON-Dat (71)
- VERB-Fin--PRON-Nom (8)
- VERB-Inf--NOUN-Abl (1)
- VERB-Inf--NOUN-Acc (42)
- VERB-Inf--NOUN-Nom (1)
- VERB-Inf--PRON (3)
- VERB-Inf--PRON-Acc (11)
- VERB-Inf--PRON-Dat (4)
- VERB-Part--NOUN (3)
- VERB-Part--NOUN-Abl (5)
- VERB-Part--NOUN-Acc (234)
- VERB-Part--NOUN-Acc-ADP(n) (1)
- VERB-Part--NOUN-Dat (15)
- VERB-Part--NOUN-Nom (45)
- VERB-Part--PRON (26)
- VERB-Part--PRON-Acc (109)
- VERB-Part--PRON-Dat (26)
- VERB-Part--PRON-Nom (4)
- iobj
- VERB-Fin--NOUN-Dat (17)
- VERB-Fin--PRON (3)
- VERB-Fin--PRON-Acc (3)
- VERB-Fin--PRON-Dat (142)
- VERB-Inf--PRON-Acc (1)
- VERB-Inf--PRON-Dat (4)
- VERB-Part--NOUN-Dat (11)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Acc (1)
- VERB-Part--PRON-Dat (56)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: shoh vet