home edit page issue tracker

This page pertains to UD version 2.

UD Egyptian UJaen

Language: Egyptian (code: egy)
Family: Afro-Asiatic

This treebank has been part of Universal Dependencies since the UD v2.14 release.

The following people have contributed to making this treebank part of UD: Roberto Antonio Díaz Hernández.

Repository: UD_Egyptian-UJaen
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: bible, fiction, nonfiction, government

Questions, comments? General annotation questions (either Egyptian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [radiaz (æt) ujaen • es]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

Egyptian-UJaen is the first dependency treebank created for the morphosyntactic annotation of pre-Coptic Egyptian. Its current state (UD v2.15) consists of 1,573 sentences and 14,650 words manually annotated from texts written in Old Egyptian, mainly from the Pyramid Texts.

The Egyptian-UJaen treebank (henceforth EUJA treebank) contains a corpus of Egyptian texts manually annotated at the University of Jaén following the Tübingen transcription system (see below). It aims to contribute to the Universal Dependencies (UD) project and to the PARSEME corpora of multiword expressions in order to compare Egyptian morphosyntactic features with those from other languages. The EUJA treebank started as UD release 2.14 with 5,515 words and 707 sentences. It contained Old Egyptian multiword expressions and sentences from the Pyramid Texts (see list of sources, below). The systematic annotation of the Pyramid Texts begins with EUJA-44. The Unas Pyramid Texts were annotated in the EUJA treebank for the UD release 2.15.

The treebank will contain texts from various historical stages: Old Egyptian, Middle Egyptian, Late Egyptian and Demotic. For an overall description of these linguistic stages, see the Language Page for Egyptian; and the bibliography below.

Acknowledgments

I thank Agata Savary (UniDive/PARSEME), Daniel Zeman (UniDive/UD) and Marco Carlo Passarotti (CIRCSE) for introducing me to computational linguistics.

Statistics of UD Egyptian UJaen

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJVERBX

Features

AdvTypeAspectCaseDefiniteForeignGenderMoodNumberPartTypePersonPolarityPossPronTypeReflexTenseTypoVerbFormVerbTypeVoice

Relations

aclacl:relcladvcladvmodamodapposauxcaseccccompcompoundconjcopcsubjcsubj:outercsubj:passdepdetdiscoursedislocatedexplexpl:pvfixedflatiobjmarknmodnsubjnsubj:outernsubj:passnummodobjoblobl:agentorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags