UD Beja Autogramm
Language: Beja (code: bej
)
Family: Afro-Asiatic
This treebank has been part of Universal Dependencies since the UD v2.8 release.
The following people have contributed to making this treebank part of UD: Martine Vanhove, Rayan Ziane, Sylvain Kahane, Bruno Guillaume.
Repository: UD_Beja-Autogramm
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: spoken
Questions, comments? General annotation questions (either Beja-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [martine • vanhove (æt) cnrs • fr; sylvain (æt) kahane • fr]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | not available |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
A Universal Dependencies corpus for Beja, North-Cushitic branch of the Afro-Asiatic phylum mainly spoken in Sudan, Egypt and Eritrea.
The treebank is an automatic conversion of the mSUD_Beja-Autogramm, which was extracted from Martine Vanhove’s corpus in Elan format (https://corpafroas.huma-num.fr/Archives/corpus.php).
Sentences are annotated with the following metadata:
sent_id
: which indicates the source file and the segmentation identifier in the source filetext
: lexical tokenizationtext_en
: english interpretationphonetic_text
sound_url
Acknowledgments
This treebank has been done in collaboration between Vanhove Martine, Ziane Rayan and Kahane Sylvain. Thanks to Bruno Guillaume for the conversion to UD and the help to finalization.
Statistics of UD Beja Autogramm
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
Aspect – Case – Definite – Degree – Deixis – ExtPos – Foreign – Gender – Mood – Number – PartType – Person – Polarity – Polite – Poss – PronType – Reflex – VerbClass – VerbForm – VerbType – Voice
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – cc – ccomp – compound:svc – cop – dep – dep:comp – dep:conj – dep:flat – dep:redup – det – discourse – dislocated – dislocated:mod – dislocated:obj – dislocated:subj – fixed – iobj – nmod – nmod:poss – nsubj – nsubj:outer – nummod – obj – obl – obl:arg – obl:mod – parataxis – parataxis:insert – parataxis:mod – parataxis:parenth – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 763 sentences and 11951 tokens.
- All tokens in this corpus are followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 47 types of words that contain both letters and punctuation. Examples: {laughter}, -t, e#, a#, aː#, dh#, -a, -aː, -b, aa#, aaa#, aaaj#, aga#, ago#, am#, aw#, aːn#, eː#, fira#, firar#, ha#, hahadn#, har#, ifi#, igam#, istaːf#, kaː#, rif#, sima#, simas#, ti#, tʔanoː#, uʔeː#, w#, wauːːː#, weːːː#, {noise}, əddew#, əgəg#, əl#, ət#, ʃibi#, ʃɛːka#, ʃʔibaː#, ʔa#, ʔo#, ʔuweːːːh#
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 41 word types tagged as particles (PART): =hoːk, =j, =ja, =jaː, =jaːj, =jeːt, =na, =ni, aflaːn, ajwa, akoː, areː, ba=, bak, bass, baː=, beːntoːj, bi=, geː, geːn, han, handeː, hasara, haːjloː, iː, ja, jaː, ka=, ki=, malia, mhasi, miʃi, nuːn, ontʔa, ontʔabit, ottʔa, taktak, tʔa, xalaːs, ʃaːwi, ʔaʃaj
- This corpus contains 1 lemmas tagged as pronouns (PRON): _
- This corpus contains 1 lemmas tagged as determiners (DET): _
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: _
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): _
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: _
- There are 1 (de)verbal forms:
- Conv
- AUX: dʔiːt
- VERB: diːt, hajiːd, diːtiːt
Nominal Features
- Fem
- ADJ: kʷaɖaːɖat
- AUX: tiki, taki, tidʔi, tindi, tirib, tiːha, tki
- DET: =t, ti=, t=, toː=, tuː=, oːt, eːt, toːt, teː=, tuːt
- NOUN: naː, lhaweː, na, takat, giɖʔa, sala, mʔari, ʃiha, ʔabaː, ʔaba
- NUM: gaːt
- PRON: ti=, -t, t=, ambataː, =oːki, imbateː, ombatoː
- SCONJ: =eːt, =jeːt, =t, ti=
- VERB: tindi, tini, tidi, tiːd, tiːfi, akai, ʔeːta, tikati, ʔeːti, ʔabkin
- Masc
- ADJ: nifri, sasuːbajaːb
- AUX: =wa, iki, ihi, indi, hijaː, irib, iːkti, dʔijaːb, idi, ini
- DET: i=, oː=, w=, uː=, oːn, uːn, j=, =b, eːn, aː=
- INTJ: jhaː
- NOUN: tak, doːr, mhiːn, jhaːm, mijʔat, dhaj, kʷiːkʷʔaːj, gaw, haˈwaːd, ʔar
- PART: ʔaʃaj
- PRON: wi=, baruːk, umbaruːk, i=, w=, baroːk, baruː, baraː, barhi, barijoːk
- SCONJ: =eːb, =jeːb, =b, ji=
- VERB: indi, ini, jʔi, iːfi, iːbri, ikati, jʔiːni, id, isni, iːkti
- Coll
- NOUN: dhaj, ʤaːntaːji, waːw
- Plur
- ADJ: raw, dabaloːjaː, naʃʃalama, daːwliloːja
- ADP: =eːb, =eː, =jeː, =jeːb, =eːt, =jeːt
- AUX: =a, nijad, ijajna, =jaː, eːdna, idʔana, ijadna, iːdna, ndi, ikatin
- DET: eːn, aː=, aːn, eː=, eːt, teː=, j=, taː=, aːt, -aː
- NOUN: ʔar, jam, kam, nda, iːjʔaː, wari, ʔarit, far, ʔaraw, fatiːra
- NUM: gali
- PART: malia
- PRON: =eː, =oːn, =uːn, hinin, =aː, =hoːn, =eːk, =jeː, =aːn, =oːkna
- SCONJ: ji=
- VERB: eːn, jʔeːn, akeːna, eːdna, hiːna, jʔeːna, ijadna, imoːra~rimna, iːfiina, jʔaʃiʃn
- X: malaːʔik
- Sing
- ADP: =iːb, =iː, =iːt, =i
- AUX: =u, =i, andi, aki, =wa, iki, ihi, indi, adi, =ju
- DET: oː=, w=, uː=, oːn, uːn, toː=, tuː=, beːn, oːt, toːt
- INTJ: jhaː
- NOUN: mbʔi
- PART: =hoːk
- PRON: =heːb, =i, =oː, ani, =hoːk, =oːk, aneːb, =joː, wi=, =ji
- SCONJ: =jeːb, =eːb
- VERB: indi, ini, jʔi, tindi, iːfi, iːbri, ani, ikati, rhan, tini
- Abl
- ADP: =iː, hoːj, hoːs, =eː, =jeː
- PRON: hoːka, =eːsoː, =iːsi, =iːsiː, =iːsoːn, =jeːsoːn, =siːsi, =iːsoː, =iːsoːk, =saj
- Acc
- DET: oː=, oːn, =b, toː=, eːn, eː=, oːt, eːt, toːt, teː=
- PART: =hoːk
- PRON: =oː, =eː, =heːb, =hoːk, =i, =oːk, =oːn, aneːb, =joː, =hoːn
- SCONJ: =b
- Com
- ADP: haːj
- Dat
- PRON: hoː
- Dis
- ADP: =ka
- Gen
- ADP: =i, =ji, =eː, =jeː
- DET: oːnaːj, baliːnaːj
- PRON: =iji, =ijoː, =ihi, =iheː, =ji, =eːhi, =eːheː, =hi, =ijoːk, =joː
- Loc
- ADP: hoːj, =iːb, =eːb, =jeːb
- Nom
- DET: uː=, uːn, aː=, aːn, tuː=, beːn, tuːt, taː=, uːt, aːt
- PRON: ani, =i, =uːn, =aː, hinin, baruːk, umbaruːk, =aːn, =uːk, =ji
- Voc
- ADP: =i, =aj
- INTJ: jhaː
- Def
- DET: i=, oː=, w=, ti=, uː=, t=, j=, toː=, aː=, tuː=
- PRON: ti=, i=, w=, t=
- SCONJ: ti=
- Ind
- DET: =t, =b
- PRON: -t
- SCONJ: =t
Degree and Polarity
- Cmp
- ADP: =ka
- Dim
- ADJ: daːwliloːja
- DET: =t
- NOUN: liːgamanaː
- VERB: daːwliiseːtiːt, liːgami
- Equ
- ADP: =eːt, =jeːt
- Neg
- AUX: arib, idi, irib, aki, tirib
- PART: baː=, ka=, ki=
- VERB: aakaj, akaːj, ibarin, idʔiːn, tkatiːm, tʔam
Verbal Features
- Aor
- AUX: iːdna, iːha, iːdn, tiːha, tiːjha
- VERB: iːfi, iːbri, tiːd, tiːfi, iːd, iːkti, iːfiina, ʔeːti, iːdn, hiːn
- Imp
- AUX: andi, dannʔi, nijad, ijajna, indi, eːdna, ijadna, iniːn, aniːw, idʔana
- PART: ka=, ki=
- VERB: indi, tindi, ikati, eːdna, manri, eːbi, dannʔi, ijadna, imoːra~rimna, tikati
- Perf
- AUX: akajeː, aki, iki, ihi, adi, irib, iːkti, ani, arib, ini
- VERB: eːn, ini, ani, tini, tidi, id, isni, tifirʔa, akan, idi
- Imp
- PART: baː=
- VERB: hi
- Opt
- AUX: ba=, bi=, idi, idiː, kaːj, tdiː
- PART: bi=, ba=
- VERB: aakaj, ibarin, akaːj, amraːj, idʔiːn, nhaː, tdiːn, thiːw, tkatiːm, tkaːj
- Pot
- AUX: ʔeːnaj
- Mid
- VERB: tifirʔa, ameːsa~sʔeː, asʔa, ikan, ʔagar, agam, akan, eːstʔi, agar, akteːn
- Pass
- VERB: agam
Pronouns, Determiners, Quantifiers
- Dem
- DET: oːn, uːn, eːn, aːn, beːn, oːt, eːt, toːt, tuːt, beːt
- PRON: beːn, oːn
- Int
- ADV: naːnaːt
- PRON: naːn, naː, ʔaːw
- Rel
- PRON: ti=, wi=, i=, w=, t=, ji=, wʔi=
- SCONJ: =eːb, =eː, =i, =jeːb, =jeː, =eːt, =jeːt, =ji, =t, =b
- Yes
- PRON: =i, =eː, =oː, =oːk, =oːn, =hi, =joː, =uːn, =aː, =eːk
- Yes
- PRON: kna, nafs
- 1
- AUX: =u, =i, =a, ʔeːnaj
- PRON: =heːb, =i, ani, =oːn, aneːb, =oː, =eː, =ji, =uːn, hinin
- VERB: ʔagar, dannʔi, hagil, hagit, haːra~riw, manri, ʔanbiːk
- 2
- AUX: =wa
- PART: =hoːk
- PRON: =hoːk, =oːk, =eːk, baruːk, umbaruːk, =uːk, hoːk, baroːk, =oːkna, barijoːk
- VERB: danri, fanrʔi, ʃanbiːb
- 3
- AUX: =u, =i, =a, =ju, =jaː, =ji, dannʔi
- PRON: =oː, =eː, =joː, =aː, =hi, =jeː, =ijoː, =ihi, =uː, baruː
- VERB: eːn, manri, ʔeːja, ʔeːta, dannʔi, ʃanbiːb, ʔeːti, fanrʔi, danri, eːjawna
- Form
- PRON: =uːn, =aːn, =hoːn, =oːn
Other Features
- Deixis
- Prox
- DET: oːn, uːn, eːn, aːn, oːt, eːt, toːt, tuːt, oːnaːj, uːt
- PRON: oːn
- Remt
- DET: beːn, beːt
- PRON: beːn
- Prox
- ExtPos
- ADV
- NOUN: doːr
- INTJ
- INTJ: iraːnaj
- PART: jaː
- PRON
- ADP: hoːj
- SCONJ
- SCONJ: =eːb, =eːt, =jeːb, =jeːt
- ADV
- Foreign
- Yes
- ADJ: faːdi
- ADP: sabbiː
- ADV: ʤuwwaːb, bass
- CCONJ: laːkin
- NOUN: maːl, hajawaːna, ʔarabijaːj, balad, bani, gahwat, saːri, xawaːʤa, dawaːhi, mawaːʔid
- PART: bass, nuːn
- PROPN: muːna, ʔaːdam
- VERB: aːmaliːna, gʷadaaman, gʷadaamani, gʷadaameːtiːt, gʷadaami, gʷadaamiːni, ikfilna, massalan, rajjhamjaːni, xatarta
- X: alla, issalaːt, siddik, ssalaːm, wa, ʕalehu, XXX, ajwa, alamdulilla, astoːfor
- Yes
- PartType
- Int
- ADV: kak, han, naːnaːt
- CCONJ: han
- VERB: keːjaːn
- Int
- VerbClass
- 1
- AUX: iki, aki, nʔati, tiki, ani, anʔa, idʔana, ini, inʔakʷn, taki
- VERB: eːn, indi, ini, iːfi, tindi, diːtiːt, akajeː, difeː, iːbri, akeːna
- VERB-Conv: diːt, diːtiːt, hajiːd
- 2
- VERB: jʔeːtiːt, jʔi, jʔeːn, hiːreːreː, ɖaːbeː, ʔiːbaːbeː, ɖaːbeːti, rhan, jʔiːni, jʔan
- VERB-Conv: hajiːd
- 1
- VerbType
- Cop
- AUX: =u, =i, =a, =wa, =ju, =jaː, =ji
- Light
- AUX: diːtiːt, ani, indi, ini
- VERB: diːtiːt, aka, ikatina, isiːsjoːdin
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: _.
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: _.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (320)
- VERB--PRON (14)
- VERB--PRON-Acc (1)
- VERB--PRON-Nom (71)
- obj
- VERB--NOUN (545)
- VERB--PRON (111)
- VERB--PRON-Acc (111)
- VERB--PRON-Nom (8)
- iobj
- VERB--PRON (26)
- VERB--PRON-Acc (2)
- VERB--PRON-Dat (3)
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: _ kna
Relations Overview
- This corpus uses 16 relation subtypes: acl:relcl, compound:svc, dep:comp, dep:conj, dep:flat, dep:redup, dislocated:mod, dislocated:obj, dislocated:subj, nmod:poss, nsubj:outer, obl:arg, obl:mod, parataxis:insert, parataxis:mod, parataxis:parenth
- The following 1 main types are not used alone, they are always subtyped: compound
- The following 10 relation types are not used in this corpus at all: csubj, expl, mark, clf, case, conj, flat, list, orphan, goeswith