UD for Gwich’in
Tokenization and Word Segmentation
- In general, words are delimited by whitespace characters.
Morphology
Tags
- Gwich’in uses 15 universal POS categories, including:
- ADJ tsal, k’eejit, tthoo, etc.
- ADP hàa, shàa, zhìt, shizhìt, etc.
- ADV gwintsàl, oondàk, yeendàk, dą̀į’, etc.
- CCONJ hàa, ts’à’, gàa
- DET yagha’, aii, zhìk, izhìk, etc.
- INTJ Àąhą’, etc.
- NOUN vadzaih, shidink’ee, shee’ii, kwaiitryah, etc.
- NUM ch’ìhłak, dǫǫ, etc.
- PART nąįį, kwàa
- PRON shįį, jidìi, etc.
- PROPN Vashrąįį K’ǫǫ, Tsiigehtchic, Dr. Burke, etc.
- PUNCT ., ?, ,, etc.
- SCONJ jì, geh’àn, aii, etc.
- VERB ihtsàl, giyahąąh’yaa, vintł’ihihtin, etc.
- X is used for three words that could not be determined at this time.
- ADJ is used for adjective enclitics following NOUN.
- Verbal adjectives are tagged as VERB.
- Free standing personal pronouns are rare in the data. There is only one instance of these (shįį).
- Interrogative pronouns are tagged PRON.
- All words that take verbal inflection are tagged as VERB at this time. This includes words that in English would take AUX.
- PART is used for words that denote negation (kwàa) or plurality (nąįį).
- AUX and SYM are not used at this time.
Features
- Gwich’in is a polysynthetic, primarily prefixing, head-final language.
- Features are not provided at this time. Morpheme information can be found for most words in the MISC column (Gloss=, MSeg=, MGloss=). However, language-specific features may be needed for the following:
- Each VERB takes one of four voice/valency markers known as its classifier. The proposed language-specific feature Classifier would have the values Ø, L, Ł, or D, consistent with the Athabascan literature. It is important to note that this use of the word classifier is different than for Chinese.
- Gwich’in and other Athabascan languages also have a noun classification system. However, this information is encoded on the VERB. These so-called classificatory VERB stems belong to one of nine stem classes. The proposed language-specific feature StemClass would have the values 1 through 9 for the following stem classes defined by Bushey (2021): stick-like, food, cloth-like, plural or rope-like, animate or dead, open container, sack of granules or enclosed/sheathed, compact, and deteriorated.
- Pronominal subject marking is mandatory on the VERB. Third-person pronominal object marking is mandatory (but assumed for other persons) when no free noun phrase is expressed.
- Inalienable nouns (NOUN) like body-parts and kinship terms must be marked with a possessor. Alienable nouns (NOUN) may or may not be marked with a possessor.
- It has been suggested that Northern Athabascan languages allow noun incorporation, but the status of noun incorporation in Gwich’in is unclear at this time. The word meaning palm in possessed form appears to be (at least partially) incorporated to denote pronominal recipients in ditransitive verbs. This is how the equivalent of pronominal indirect objects in English are expressed. Whether or not other noun objects can be incorporated warrants further exploration.
- Postpositions (ADP) must be inflected for person and number with human objects. If there is an areal object, the postposition takes the prefix gwï-. The language-specific feature-value Areal=Yes may be needed for this.
- Spatial adverbs known as directionals encode three pieces of information: the direction itself (up/down, upstream/downstream, inland/upland, etc.), the orientation (allative, punctual, areal, ablative), and distance from the speaker (short, long, very long, straight). Language-specific features may be needed for these.
Syntax
- As a polysynthetic language, ideas that are expressed in many languages with multiple words are expressed in Gwich’in with one.
- The verb occupies the final position in a clause, and the verb stem occupies the final position in a verb word. Verb prefixes provide other relevant information and together with the verb stem from a verb word and often a verb sentence.
- It is up for debate whether null arguments exist in Athabascan. Pronominal subject and object markers on the verb are often treated as core arguments with free noun phrases treated as adjuncts. However, this treebank remains consistent with current UD guidelines and treats words as the basic elements connected by dependency relations and as such, treats free standing nominals (subject, object, and indirect object) as core arguments when expressed. These take the dependency relations nsubj, obj, and iobj, respectively. When free noun phrases do not occur, nominal information is solely expressed on the verb and can only be annotated in the FEATS or MISC columns.
Treebanks
There is one Gwich’in UD treebank: