UD for Ancient Hebrew 
Tokenization and Word Segmentation
No tokens in the Ancient Hebrew treebank should contain whitespace. The following are made into separate tokens:
- Prepositions (ב, כ, ל, מ)
- Possessive and object pronouns (ני, נו, ו, ם, …)
- The corresponding independent pronoun is used as the lemma
- Conjunction ו
- Definite determiner ה
- This includes ה when it appears as demonstrative agreement on adjectives, participles, and demonstrative determiners
- Since the text includes vowels diacritics, ה is included as a token even when it does not correspond to a full character in the consonantal text.
Morphology
Tags
All tags are used except X
and SYM
. AUX
is used for the copula היה.
The positive and negative existentials ישׁ and אין are tagged VERB.
Participles are tagged either VERB or NOUN. If they have arguments or obliques, they are tagged as VERB, but if they do not then they are tagged as NOUN if they participate in nominal phrases.
The correspondences between XPOS (BHSA feature sp
) and UPOS are listed below.
Rows prefixed with → indicate that the part of speech tag’s correspondence is conditioned by the BHSA lexical set feature.
BHSA tag | BHSA name | UPOS | Notes |
---|---|---|---|
adjv |
adjective | ADJ | Also NOUN in certain situations |
→ ordn |
ordinal | NUM | |
advb |
adverb | ADV | |
art |
article | DET | Also SCONJ in certain situations |
conj |
conjunction | CCONJ or SCONJ | |
inrg |
interrogative particle | ADV or PART | |
intj |
interjection | INTJ | |
nega |
negative particle | ADV | |
nmpr |
proper noun | PROPN | |
prde |
demonstrative pronoun | PRON | |
prep |
preposition | ADP | |
prin |
interrogative pronoun | PRON | |
prn |
pronominal suffix | PRON | Tag added in conversion process |
prps |
personal pronoun | PRON | |
punct |
punctuation | PUNCT | Tag added in conversion process |
subs |
noun | NOUN | |
→ card |
cardinal | NUM | |
→ nmcp |
copulative noun | VERB | These are the existential verbs |
→ padv |
potential adverb | ADV | Sometimes |
→ ppre |
potential preposition | ADP | Sometimes |
verb |
verb | VERB | Also NOUN in certain situations |
→ vbcp |
copulative verb | AUX |
Features
The following universal features are in use:
- Aspect: AUX, VERB
- ExtPos
- Gender: ADJ, AUX, NOUN, PRON, VERB
- Mood: VERB
- Number: NOUN, PRON, ADJ, VERB
- NumType: NUM
- Person: AUX, VERB
- Polarity: ADV
- PronType: PRON
- Tense: VERB
- VerbForm: NOUN, VERB
- Voice: VERB
The following language-specific features are in use:
The following MISC features are present:
Gloss
- Currently taken from the BHSA
gloss
feature
- Currently taken from the BHSA
LId[SDBH]
- ID of the (mostly) disambiguated word in MARBLE’s Semantic Dictionary of Biblical Hebrew
LId[Strongs]
- Number of the word root in Strong’s Concordance
- The values come from the MACULA corpus, which assigns non-numeric values to function words (conjunctions, prefixed prepositions) which are not listed in the original concordance
Ref
- The values are formatted as
BOOK_CHAPTER.VERSE
, e.g.GEN_1.1
- The book abbreviations are listed below
- The values are formatted as
Ref[BHSA]
- The numeric ID of the word in the BHSA corpus
Ref[MACULA]
- The ID of the word in the MACULA corpus
SpaceAfter=No
Translit
- The value of this field follows the Library of Congress romanization standard
Book | Ref Abbreviation |
---|---|
Genesis | GEN |
Exodus | EXO |
Leviticus | LEV |
Numbers | NUM |
Deuteronomy | DEU |
Joshua | JOS |
Judges | JDG |
Ruth | RUT |
1 Samuel | 1SA |
2 Samuel | 2SA |
1 Kings | 1KI |
2 Kings | 2KI |
1 Chronicles | 1CH |
2 Chronicles | 2CH |
Ezra | EZR |
Nehemiah | EZR |
Esther | EST |
Job | JOB |
Psalms | PSA |
Proverbs | PRO |
Ecclesiastes | ECC |
Song of Songs | SOS |
Isaiah | ISA |
Jeremiah | JER |
Lamentations | LAM |
Ezekiel | EZK |
Daniel | DAN |
Hosea | HOS |
Joel | JOL |
Amos | AMO |
Obadiah | OBA |
Jonah | JON |
Micah | MIC |
Nahum | NAM |
Habakkuk | HAB |
Zephaniah | ZEP |
Haggai | HAG |
Zechariah | ZEC |
Malachi | MAL |
Syntax
The subtypes acl:relcl, compound:smixut, nmod:poss, nsubj:outer, and obl:npmod are used. The relation compound
is currently unused.
The relations iobj
, and clf
are unused.
The relations list
, goeswith
, reparandum
, and dep
are currently unused, but may be used in future.
Treebanks
There is 1 Ancient Hebrew UD treebank: