UD for Balochi
Balochi is a dialect continuum and until recently, the language was rarely written, so there does not seem to be a written standard with enough prestige to prevail on the vast territory where Balochi is spoken. One standard was proposed by Jahani (2019); note however that the texts in our data follow a different orthography. Unless specified otherwise, our data represent southeastern (Pakistani) Balochi.
Tokenization and Word Segmentation
- Punctuation is written adjacent to a neighboring word like in English and many other languages. In the annotation, punctuation symbols are separate tokens; that is, words are delimited by spaces or punctuation.
Morphology
Some morphemes that are treated as bound morphemes in the literature are in fact written as separate words under the orthography employed in our data. This applies both to the case suffixes of nouns and to the conjugation suffixes of verbs.
Nominal Features
There is no grammatically relevant gender.
According to Jahani and Korn (2009) p. 652, Balochi nouns have five cases, termed direct, oblique, object, genitive, and vocative. We map the first three cases to other names in the UD terminology. Under the orthography used in our data, case suffixes are written as separate words, they are thus analyzed as postpositions (ADP). The Case feature is annotated on the postposition that contributes the case, not on the noun itself.
The direct case roughly corresponds to the nominative in UD. It is used for the subject of all intransitive
verbs and of transitive verbs in the present and future. Balochi has split ergativity like Indian languages,
hence transitive verbs in the past tense have the ergative alignment, meaning that the object rather than
the subject has this case form there. It is the simple uninflected noun. In our orthography it means that
there is no postposition, hence Case=Nom
is not annotated anywhere.
The oblique case is marked by the postposition ءَ ‘a. It is used as the accusative in the present and
future, and as the ergative subject in the past tense. It is also placed between the noun and some more
specific postpositions. We annotate it Case=Erg
.
In ditransitive clauses, the object case marks the recipient, i.e., it corresponds to the
dative (Case=Dat
). Its morpheme is را rā and it may be combined with the oblique morpheme ءَ ‘a.
The genitive morpheme is ءِ ‘i and it is also written separately. We annotate it Case=Gen
.
Vocative is unmarked in singular.
Nominal words can appear in two Number forms, singular (Sing
) and plural (Plur
). However, the
number inflection is fused with the case inflection, that is, plural marking would be part of the
case postposition, and there is no number distinction in the direct (nominative) case.
Pronouns
Personal pronouns exist in the first and the second person. Distal demonstratives are used instead of personal pronouns in the third person. The reflexive pronoun is wat.
من | man | I | Number=Sing|Person=1|PronType=Prs |
تو | tō | you | Number=Sing|Person=2|PronType=Prs |
ما | mā | we | Number=Plur|Person=1|PronType=Prs |
شُما | šumā | you | Number=Plur|Person=2|PronType=Prs |
آ | ā | he/she/it/they/that/those | Deixis=Remt|PronType=Dem |
اے | ē | this/these | Deixis=Prox|PronType=Dem |
وت | wat | oneself | PronType=Prs|Reflex=Yes |
Possessive pronouns are generally the personal pronouns with the genitive suffix -ī; but unlike nouns,
they are written together with the suffix as one word. We treat them as distinct lexemes with their own
lemma and with the Poss=Yes
feature, not as genitive forms of the non-possessive personal pronouns.
The forms ending in -ī are used attributively; there are also predicative forms with an additional -g.
TODO: Is there a feature we can use to distinguish the predicative form?
At least for the distal demonstrative, the genitive/possessive form is also used before the oblique case marker. For example, آئی ءَ ā’ī ‘a (áiá) is the oblique/accusative/ergative case; آئی ءَ را ā’ī ‘a rā (áiárá) is the object/dative case of “that”.
منی | manī | my | Number=Sing|Person=1|Poss=Yes|PronType=Prs |
منیگ | manīg | mine | Number=Sing|Person=1|Poss=Yes|PronType=Prs |
تئی | ta'ī | your | Number=Sing|Person=2|Poss=Yes|PronType=Prs |
تئیگ | ta'īg | yours | Number=Sing|Person=2|Poss=Yes|PronType=Prs |
آئی | ā'ī | his/her/its/their/of that/of those | Deixis=Remt|Poss=Yes|PronType=Dem |
وتی | watī | one's own | Poss=Yes|PronType=Prs|Reflex=Yes |
Like in English, the reciprocal pronoun is composed of two words. TODO: What to do with it? Do the words occur also independently?
یکے دومی | yakē dōmī | each other | PronType=Rcp |
Interrogative pronouns.
کئیا | ka'iyā | who | PronType=Int |
چے | čē | what | PronType=Int |
Indefinite article??? (At least that was the gloss assigned by the Balochi teacher.) It would apply to the preceding nominal. یے yē
Verbal Features
The conjugation suffixes of Balochi verbs come out as auxiliaries that follow the main verb. Example:
کندگ | kandag | to laugh / laughing | VerbForm=Inf |
من کندگا آں | man kandagā āñ | I am laughing | The auxiliary is Number=Sing|Person=1. The main verb should probably be some non-finite form, maybe a participle. And maybe a progressive participle (I saw a similar form glossed as "progressive aspect of verb".) |
تو کندگا ئے | tō kandagā ay | you (Sing) are laughing | |
آ کندگا اِنت | ā kandagā int | he is laughing | |
ما کندگا اِن | mā kandagā in | we are laughing | |
شُما کندگا اِت | šumā kandagā it | you (Plur) are laughing | |
آ کندگا اَنت | ā kandagā ant | they are laughing | |
تو کند اِت | tō kand it | you (Sing) laughed | |
تو کند اِتگ | tō kand itag | you (Sing) have laughed |
The infinitive can be used and inflected as a verbal noun.
The auxiliary forms are similar or identical to the copula which would be used with non-verbal predicates.
Present-tense auxiliaries from the data: اِنت int (3rd person Sing; this form is also the copula “is”) اَنت ant (3rd person Plur; but the context could have been also past rather than present)
Past-tense auxiliaries from the data: اِت it (3rd person Sing) کت kt (3rd person Sing) کُت kut (is it the same as kt or not?) جت jt (perhaps this is not auxiliary? It was in the causative sentence.)
Adpositions
Besides the three case morphemes that were mentioned above (and that are considered mere suffixes by some authors), there are also “ordinary” adpositions.
چہ | čh | from (Case=Abl)? |
Coordinating Conjunctions
ءُ | 'u | and |
یا | yā | or |
بلے | balē | but |
Particles
The negative particle نہ nah.
Particle at the end of the sentence: ئے ē (perhaps a question particle? or is it in fact the auxiliary ay, 2nd person singular, see above?) ئِے yie (the same or not?)
Tags
*
Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.
Features
*
Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.
Syntax
*
Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.
Treebanks
There are N Balochi UD treebanks:
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.
References
- Carina Jahani, Agnes Korn (2009). Balochi. In: Gernot Windfuhr (ed.) The Iranian Languages, pp. 634–692. Routledge, Oxon, UK. ISBN 978-0-415-62235-6.
- Carina Jahani (2019). A Grammar of Modern Standard Balochi. In: Acta Universitatis Upsaliensis. Studia Iranica Upsaliensia 36. 292 pp. Uppsala, Sweden. ISSN 1100-326X. ISBN 978-91-513-0820-3.