UD for Assyrian
Tokenization and Word Segmentation
- Words are generally delimited by whitespace or punctuation.
- Punctuation marks are attached to the neighboring word. We always tokenize them as separate tokens.
- Coordinating conjunction and prepositions are separated from the words that follows them in a sentence.
- Multiword tokens are not used in Assyrian.
Morphology
Tags
- The tags
NUM
,INTJ
,SYM
, andX
are not used. - In this Assyrian treebank, 13 universal tags have been used.
- Certain words like “ܒܜ , ܩܡ ,ܟܝ , ܚܘܫ , ܫܘܩ” are tagged as PART and have a dependency relation as aux. Together with the following VERB, these words change the verb tense.
Features
- Nominal words (NOUN, PROPN and PRON) have an inherent Gender feature with values
Masc
orFem
. - Number has 2 possible values: Sing and Plur.
- Verbs inflect for Gender, Number, Person, Tense and Mood. There are two types of verb forms (VerbForm): the finite verb (
Fin
) and the participle (Part
).- Voice is marked only for passive forms; we do not use
Voice=Act
.
- Voice is marked only for passive forms; we do not use
- PronType is used with pronouns (PRON) and determiners (DET).
- The Poss feature marks possessive personal pronouns.
- Person is a lexical feature of personal pronouns (PRON) and has three values,
1
,2
and3
.
Syntax
Core Arguments, Oblique Arguments and Adjuncts
- There are no morphological cases.
- Nominal subject is a bare noun phrase. It typically precedes the verb. Its Person, Number and Gender are cross-referenced by the verb.
- Nominal object is a bare noun phrase or a prepositional phrase. It typically follows the verb.
Non-verbal Clauses
- The copula verb ܗܵܘܹܐ (be) is used in
Relations Overview
- The following relation subtypes are used in Assyrian:
- The following relation types are not used in Assyrian at all: expl, dislocated, vocative, appos, nummod, clf, fixed, flat, list, orphan, goeswith, reparandum, dep
Treebanks
There is only one Assyrian UD treebank at present: