UD for Pashto
For the principles of transliteration of Pashto used in UD see the Transliteration page.
Tokenization and Word Segmentation
- The words are delimited by whitespaces and punctuation. Any possible exceptions have not yet been established.
- Multiword tokens are used in several cases:
- Partially separable verbs (see below) in forms, when the two parts are connected (بندوم bandawë́m “I close” → کوم band + بند kawë́m).
- Separeted verb prefixes connected with negative particle (به ونه خورم bë wënë́ xorëm “I will not eat” → و wë + نه në).
Morphology
Lemmatization
- Direct (nominative) singular masculine (if applicable) form is used as lemma for nominals.
- Infinitive (in the direct case) is used as lemma for verbs, with these exceptions:
- The verb “to be”, used as copula and as auxiliary verb for perfect tenses, does not have the infitinive, so the form of first person singular of the present tense یم yëm “I am” is used instead.
- The existential word شته šta “there is / there are”, tagged as VERB, has only one form, so it is used as the lemma.
Tags
- Pashto uses all 17 universal tags.
- Several words are tagged as PART:
- Negative particles نه në and مه ma “no/not” and the affirmative particle هو ho “yes”.
- Modal particles: باید bấyad (necessity “must / have to”).
- Only the verb یم yëm “to be” (used as copula and auxiliary verb for perfect tenses) and some uses of the verb کېدل kedë́l “to become” (when used as auxiliary verb for passive voice or potential forms) are tagged as AUX. Modal verbs, such as غوښتل ġox̌të́l “to want to”, are tagged as VERB (besides that, some modal meanings are expressed using modal particles or using the mentioned potential verb forms). Light verbs are tagged also as VERB, with the nominal part depending on them with
xcomp
relation. - Pronouns that depend on nouns and behave similarly like their attributes (some of them even agree with the nouns in number, case and gender) are tagged as DET; possessive pronouns are mostly treated this way. Pronouns used individually (often as arguments of a verb) are tagged as PRON. This includes e.g. relative or non-possessive personal pronouns. Various interrogative, demonstrative or indefinite pronouns can be tagged both ways depending on the situation. Enclitic weak pronouns, used as unstressed core arguments or as alternative possessive pronouns, are always tagged as PRON, even when marking possession, because they do not have the attributive relation to the noun, e.g. they follow the noun, while all other pronouns tagged as DET precede it. Directional pronouns, merged with several prepositions, are separated from them with a PRON tag.
- The deverbal forms like infinitive or participles (sometimes behaving like verbal noun and verbal adjectives) are usually tagged as VERB. Only nouns and adjectives originally derived from infinitives or participles, but now perceived clearly as nouns and adjectives, are tagged as NOUN and ADJ.
- Adjectives and adverbs derived from adjectives have often the same form. Their tagging as ADJ or ADV depends on the context.
Features
- There are three VerbForm values used in Pashto: finite
Fin
, infinitiveInf
and participlePart
. - An important feature of Pashto verbs is Aspect, which strictly divides verb forms to imperfect
Imp
and perfecgtPerf
. - The finite verb forms inflect for Mood feature with indicative
Ind
, imperativeImp
, subjunctiveSub
and potentialPot
values. - The finite verb forms conjugate for Tense feature taking present
Pres
, pastPast
or futureFut
mark. - The finite verb forms inflect also for the Person feature with the common three values, which is also an inherent feature of many personal pronouns.
- Generally all inlfectional parts of speech inflect for Number taking a singular
Sing
or a pluralPlur
value. Infinitives always behave like plural, so they do not have the number tagged. Non-past finite verb forms do not have the number feature in the third person, since the forms for both numbers are always identical. - Nominals, participles and infinitives inflect for Case feature. There are five cases tagged in UD for Pashto: direct (marked as nominative)
Nom
, oblique (marked as accusative)Acc
, locativeLoc
, ablativeAbl
and vocativeVoc
. - Nouns and some pronouns have inherent Gender feature with two possible values: masculine
Masc
and feminineFem
. Adjectives, other pronouns and participles inflect for the gender in order to agree with nouns. Finite verb forms inflect for the gender only in the past forms in the third person (both singular and plural). - Verbs and pronouns use the feature Variant to distinguish between various forms, denoted long
Long
, shortShort
, weakWeak
and directionalDir
.
Syntax
Core Arguments
- Core arguments (subjects and objects) in Pashto are mostly nouns, pronouns or infinitives (behaving like verbal nouns) in either bare direct case
Nom
or bare oblique caseAcc
. The exact use of these cases depends on the inherent transitivity of the verb and the voice and tense used (language phenomenon called split ergativity occurring also in other Indo-Iranian langages) - The only argument (i.e. the subject) of intransitive verbs or of transitive verbs used in the passive voice are always in the direct case
Nom
. - For transitive verbs in the active voice holds:
- The subject in non-past tenses is always in the direct case
Nom
. - The subject in past tenses is always in the oblique case
Acc
. - The object in all tenses is almost always in the direct case
Nom
.- The only exceptions are personal pronouns for the first and the second person singular in non-past tenses, where oblique forms ما mâ “me”, تا tâ “you” are used instead of the direct رۀ zë “I”, تۀ të “you”.
- The subject usually comes before the object regardless of the case.
- The verb agrees in
Person
andNumber
(and alsoGender
in the third person of past tenses) with the subject in non-past tenses and with the object in past tenses.
- The subject in non-past tenses is always in the direct case
intransitive | transitive | ||
nsubj | nsubj | obj | |
non-past | direct | direct | direct * |
past | direct | oblique | direct |
- exceptions: ما mâ, تا tâ (see above)
Non-verbal Clauses
- The copula verb یم yëm “be” (or more precisely “I am”) is used in most non-verbal clauses.
- The nominal part of the predicate is usually in the direct case
Nom
- In the existential clauses the word شته šta “there is / there are” is used, but it is tagged VERB
Relations Overview
- The following relation subtypes are used in Pashto:
- aux:pass
- nsubj:pass
- compound:prt
- orphan:nsubjobj
Instruction: Include links to language-specific relations definitions if any. —
Treebanks
There is currently no Pashto UD treebank.
Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
and the treebank code in the file name.