UD for Ligurian
Tokenization and Word Segmentation
- In general, words are delimited by whitespace.
- Punctuation marks are treated as separate tokens, with few expections:
- Apostrophes indicate elision, and are attached to the neighbouring word that underwent elision. Most commonly, this occurs with determiners: l’erboo = l’ erboo, ‘n’atra = ‘n’ atra.
- Numerical expressions are treated as single words, e.g. 12:45, 2.5%.
- Abbreviations are treated as single words and may include punctuation, e.g. ecc., s.r.l.
- Multi-word tokens occur for the following two cases:
- Contractions of prepositions and definite articles: inta = inte a, in scê = in sce e, pe-o = pe o, do = de o.
- Contractions of verbs with clitics: veddilo = vedde lo, dâghela = dâ ghe la, aveine = avei ne.
Morphology
Tags
- Ligurian uses all 17 universal POS tags.
- The only word tagged as PART is the euphonic particle l’, used in the case of clitic doubling when the verb starts with a vowel: a l’ammia, o l’existe.
- Ligurian auxiliary verbs, tagged AUX, are as follows:
- Ëse and stâ, functioning as copulas: stanni ben!, Zena a l’é unna çittæ.
- Stâ and vegnî, the passive auxiliaries: i libbri en stæti traduti, a vegnià castigâ.
- Ëse and avei, the tense auxiliaries: l’ò scrito, emmo cantou.
- The modals dovei (necessitative), poei and savei (potential), voei (desiderative).
- The tag DET is used for articles (un amigo, unn’amiga, i amixi, tutti i amixi) as well as for adjectives playing the role of a determiner: demonstrative (sto libbro), exclamatives (che mâ de pê!), indefinites (un atro pâ de maneghe), interrogative (che tipo de persoña a l’é?), negatives (nisciun aggiutto), possessives (mæ moæ), total (tutto o mondo).
Features
- NOUNs inflect for Gender (
Masc
orFem
) and Number (Sing
orPlur
). - VERBs can inflect for Mood, Tense, Person and Gender:
Syntax
- Ligurian is an SVO language, meaning that subjects (nsubj) are typically pre-verbal, while objects (obj) are usually post-verbal.
- Nominal subjects (nsubj) and direct nominal objects (obj) are bare noun phrases without adpositions.
- The iobj relation is only used for dative pronominal clitic complements: o ghe dixe, o me piaxe, etc. When the indirect object is realized as a prepositional phrase, it is labeled as obl.
- The following subtype relations are used:
- expl:pv, for expletive or pleonastic nominals used in pronominal verb: anâsene, ësighe, etc.
- expl:impers, for impersonal verbs: se capisce, se sente unna voxe.
- acl:relcl, for relative clauses.
Treebanks
There is 1 Ligurian UD treebank: