UD Skolt Sami Giellagas
Language: Skolt Sami (code: sms
)
Family: Uralic
This treebank has been part of Universal Dependencies since the UD v2.5 release.
The following people have contributed to making this treebank part of UD: Jack Rueter, Markus Juutinen, Francis Tyers, Tommi A Pirinen, Mika Hämäläinen.
Repository: UD_Skolt_Sami-Giellagas
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: nonfiction, news, spoken
Questions, comments? General annotation questions (either Skolt Sami-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [rueter • jack (æt) gmail • com]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually in non-UD style, automatically converted to UD |
Description
The UD Skolt Sami Giellagas treebank is based almost entirely on spoken Skolt Sami corpora.
UD Skolt Sami is the original annotation (CoNLL-U) for texts in the Skolt Sami language. It originally consists of twenty translated sentences http://ilazki.thinkgeek.co.uk/brat/#/uralic/sms made by Hilkka Fofonoff from the Finnish texts: here with UD 1. dependencies. Subsequent sentences come from the Giellagas Corpus of Spoken Saami Languages of the University of Oulu, Finland, which, in part, include research materials transferred from (Kotimaisten kielten keskus) «Kotus» ‘Institute for the Languages of Finland’.
Treebank sentences marked with text id beginning in [kotus-skak2010] originate from the publication Sääʹmǩiõll, äʹrbbǩiõll, for which the publisher ‘Institute for the Languages of Finland’ (Kotimaisten kielten keskus) has granted written permission to include in the treebank. Citation of the original publication should be included when the treebank is used (see References section below).
https://github.com/rueter/erme-ud-skolt-sami
Acknowledgments
The original annotations have been performed by Jack Rueter at the University of Helsinki and Markus Juutinen at the Giellagas Institute (University of Oulu, Finland) using morphological tools developed with funding from a Kone Foundation «Language Programme» funded project: «Skolt Sami Revitalization through Intelligent Computer-assisted Language Learning means and the development of guidelines for transfering these methods to other threatened languages» (2015–2018) with the linguistic consultation of Merja Fofonoff and Eino Koponen. The tools used have been facilitated through the open-source Giella infrastructure at the Norwegian Arctic University in Tromsø.
Work with the Skolt Sami treebank builds upon previous experience with the UD_Erzya-JR treebank as well as growing discussions with Francis Tyers, Tommi Pirinen, Jonathan Washington, Mika Hämäläinen and Niko Partanen. Without the Skolt Sami speakers and writers themselves, however, we would be no where…
References
- Markus Juutinen 2023: Koltansaamen kielikontaktit, Vähemmistökieli muuttuvassa kieliympäristössä. Oulun yliopiston tutukijakoulu; Oulun yliopisto, Humanistinen tiedekunta, Giellagas-Instituutti.
- Eino Koponen, Jouni Moshnikoff & Satu Moshnikoff. 2010: Sääʹmǩiõll, äʹrbbǩiõll.Helsinki: (Kotimaisten kielten keskus) Institute for the Languages of Finland. Dommjânnmlaž ǩiõli tuʹtǩǩeemkõõskõs. Online publications of the Institute for the Languages of Finland, 14. ISSN 1796-041X. URL: http://scripta.kotus.fi/www/verkkojulkaisut/julk14/
- Pekka Sammallahti, Jouni Moshnikoff. 1991: Suomi-Koltansaame sanakirja / Lääʹdd-sääʹm sääʹnnǩeʹrjj [Finnish-Skolt Sámi Dictionary]. Girjegiisá Oy. Ohcejohka.
-
Satu da Jouni Moshnikoff, Eino Koponen, Miika Lehtinen. 2020: Sääʹmǩiõl ǩiõllvueʹppes / Koltansaamen kielioppi. Sääʹmteʹǧǧ-Saamelaiskäräjät.
- (citation)
Statistics of UD Skolt Sami Giellagas
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – VERB – X
Features
AdpType – AdvType – Animacy – Aspect – Case – Clitic – Connegative – Degree – Derivation – Mood – NameType – Number – Number[psor] – NumType – Person – Person[psor] – Polarity – PronType – Reflex – Tense – Typo – VerbForm – Voice
Relations
acl – acl:relcl – advcl – advcl:tcl – advmod – advmod:deg – advmod:eval – advmod:foc – advmod:lmod – advmod:mmod – advmod:neg – advmod:tmod – amod – appos – aux – aux:nec – aux:tense – case – cc – cc:preconj – ccomp – conj – cop – dep – det – discourse – dislocated – expl – fixed – flat:name – goeswith – mark – nmod – nmod:poss – nsubj – nsubj:cop – nsubj:pass – nummod – obj – obl – obl:agent – obl:lmod – obl:tmod – orphan – parataxis – punct – reparandum – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 250 sentences and 2961 tokens.
- This corpus contains 678 tokens (23%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 27 types of words that contain both letters and punctuation. Examples: tõt-i, i-ǥõl, dõõzz-e, koll’je, leäk-a, lij-a, nuʹt-i, tok-i, võʹll’ji, Haaʹlääk-a, Ij-ǥo, Jeänn’jam, aalmi-han, eʹpet-i, jeänn’jad, jiânnai-a, koozz-a, miõtt-talkknid, nåkkam-a, olgglab-a, puk-i, pâi-i, ton-õs, tõzz-e, vueʹljžiǩ-a, võʹll’jam, Šurr-a
Morphology
Tags
- This corpus uses 16 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB, X
- This corpus does not use the following tags: SYM
- This corpus contains 7 word types tagged as particles (PART): deʹbe, gõs, ni, tâma, veʹt, äʹn, še
- This corpus contains 20 lemmas tagged as pronouns (PRON): dât, dõõt, jeeʹres, jiânnai, jiõčč, kååʹtt, mii, mon, mâid, mâiʹd, måttam, nuʹbb, nåkkam, puk, son, ton, tut, tät, tõt, ǩii
- This corpus contains 7 lemmas tagged as determiners (DET): jeeʹres·årra, jäänab, mäŋgg, måkam, tok, tät, tõt
- Out of the above, 2 lemmas occurred sometimes as PRON and sometimes as DET: tät, tõt
- This corpus contains 6 lemmas tagged as auxiliaries (AUX): feʹrttjed, i-ǥõl, ij, iʹlla, leeʹd, õlggâd
- Out of the above, 3 lemmas occurred sometimes as AUX and sometimes as VERB: iʹlla, leeʹd, õlggâd
- There are 4 (de)verbal forms:
- Ger
- VERB: vaʹlljeeǩâni
- Inf
- AUX: leeʹd
- VERB: jieʹlled, kooǯǯted, väʹldded, hiâvted, jååʹtted, mõõnnâd, mainsted, njuiʹǩǩeed, ǩiččâd, jälsted
- Part
- AUX: leämmaž, õlggâm
- VERB: teâđstam, välddam, kaunnâm, košklõõvvâm, koǯstam, koǯǯâm, čõnnum, šõddâm, hoʹhssjam, jeällam
- Vnoun
- VERB: puälddmõõžž
Nominal Features
- Hum
- NOUN: ooumaž, nijdd, ääkka, eččad, niõđ, kaavân, kåʹddpäärnaž, päʹrnn, ääkkaž, jeäʹnn
- PROPN: Peter, Mary, Brown, Jane, Smith
- Dual
- PRON: suännast, muännaid, suäna
- Plur
- AUX: liâ, jeäʹp, jeäʹla, jiâ, Jeäʹled, lee, leʹjje
- DET: tok
- NOUN: muõrid, kooʹddid, kåʹllkåʹđđnjõõzzid, oummu, peästtõõǥǥ, päärna, aaiʹtid, jurddi, järraz, kaappi
- PRON: miʹjjid, seeʹst, tõid, tuk, dõõk, jiijj, mij, måttam, sij, tõk
- VERB: jälste, ceäʹlǩǩe, mõõnnâd, puõʹtte, räʹjje, vaʹldde, kâʹčče, mõʹnne, vuejjle, aʹhtte
- Sing
- ADJ: jõnn, nuuʹbb, nuʹbb, tiõrvâs, jåʹttel, kuuʹǩǩ, lääʹđesmiõllsab, muʹvddem, occkaž, veeʹres
- ADV: mâʹst, mõõzz
- AUX: lij, ij, leäi, jiõk, õõk, jiõm, leäk, leäm, iʹlla, iʹlleäk
- DET: tõn, tõt, Tät, määŋg
- NOUN: ooumaž, tueʹllj, mieʹccest, heävaš, nijdd, stäʹlmmstääll, ääkka, eččad, niõđ, niõđâž
- NUM: kueʹhtt, õhtt, čiččâm, kooum, kooumâst, kuâhttlovitt, vitmlo
- PRON: son, tõt, tõn, ton, mon, suu, mii, muu, puk, dõõt
- PROPN: Peter, Mary, Brown, Franskkjânnam, Jane, Pariizzâst, Smith
- VERB: ceälkk, mõõni, puõʹđi, šõõddi, vaaʹldi, ǩieʹzzi, leäi, lij, vuõʹlji, ǩiõzzam
- VERB-Vnoun: puälddmõõžž
- Abe
- NOUN: čääʹʒʒtää
- VERB-Ger: vaʹlljeeǩâni
- Acc
- ADJ: kuuʹǩǩ, nuuʹbb
- DET: määŋg
- NOUN: tueʹllj, niõđ, muõrid, čääʹʒʒ, kåʹddtueʹllj, kooʹddid, kåʹllkåʹđđnjõõzzid, peäʹl, vuâra, ääušas
- NUM: kooum
- PRON: tõn, muu, mâiʹd, suu, tõid, miʹjjid, tuu, Tän, jiijjâs, muännaid
- VERB-Vnoun: puälddmõõžž
- Com
- NOUN: mannuin, peeiʹvin, heäppšines, jieʹlličaaʹʒʒin, kaarbivuiʹm, paaʹrnines, peeʹlljin
- PRON: mõin, tõin
- Ess
- NOUN: kämmǥižžen, triâŋggân, kaavân, kueʹllen, näuʹdden, heäppšen, låʹdden, ooumžen, säldten
- NUM: koummân
- VERB: vueʹtǩǩmen, håiddmen, jieʹllmen, jååʹttmen, viikkmen
- Gen
- ADJ: nuuʹbb
- DET: tõn
- NOUN: ääkka, heäppaž, suõv, tueʹllj, kuäʹđ, kämmǥa, ääiʹj, Peter, Siõm, caar
- NUM: kooum
- PRON: tõn, suu, dõõn, tuu, Mij, mõõn, nuuʹbb, tän
- PROPN: Franskkjânnam
- Ill
- ADJ: jõnn
- ADP: luzz, årra, räjja
- ADV: koozz, koozz-a, mõõzz
- NOUN: pärnna, kuätta, põʹrtte, äitta, Pella, aaiʹtid, ekka, heävašstallju, kuättses, kuäʹttnjälmma
- PRON: miʹjjid, muʹnne, tõid, ǩeäzz
- Loc
- ADV: koʹst, mâʹst
- NOUN: mieʹccest, oummust, ääiʹtest, luândstes, lõõnjâst, põõrtâst, tuõddrest, vueiʹvvgåårdest, škooulâst, ǩeeʹrjteeʹjest
- NUM: kooumâst
- PRON: suʹst, seeʹst, dââʹst, suännast, muʹst, tõʹst
- PROPN: Pariizzâst
- Nom
- ADJ: nuʹbb, tiõrvâs, jåʹttel, lääʹđesmiõllsab, muʹvddem, occkaž, veeʹres, šurr, šuurab
- DET: tõt, Tät, tok
- NOUN: ooumaž, heävaš, nijdd, stäʹlmmstääll, eččad, niõđâž, Peʹll, källsaž, tieʹrmes, triâŋgg
- NUM: kueʹhtt, õhtt, čiččâm, kuâhttlovitt, vitmlo
- PRON: son, tõt, ton, mon, mii, puk, dõõt, tõt-i, kååʹtt, nåkkam
- PROPN: Peter, Mary, Brown, Jane, Smith
- VERB: älgg
- Par
- NOUN: eeʹǩǩed
Degree and Polarity
- Cmp
- ADJ: lääʹđesmiõllsab, šuurab
- Dim
- NOUN: vuõddjez
- Neg
- AUX: ij, jiõk, jiõm, i-ǥõl, jeäʹp, iʹlla, iʹlleäk, jiâ, Ij-ǥo, Jeäʹled
- INTJ: ij
- PART: ni
- VERB: iʹlla
Verbal Features
- Perf
- AUX-Part: leämmaž, õlggâm
- VERB-Part: teâđstam, välddam, kaunnâm, košklõõvvâm, koǯstam, koǯǯâm, čõnnum, šõddâm, hoʹhssjam, jeällam
- Cnd
- VERB: õõlǥči, kååddče, leʹčče, piâzzčiǩ, siltteʹče, vuäđče
- Imp
- AUX: Jeäʹled
- VERB: mõõnnâd, tiâr, mõõnsââʹst, puäʹđ, kueʹst, kulddâl, säärn, vueiʹn, vueiʹt, Ääʹved
- Ind
- AUX: lij, ij, leäi, jiõk, liâ, õõk, jiõm, õõlǥ, leäk, jeäʹp
- VERB: ceälkk, mõõni, puõʹđi, šõõddi, vaaʹldi, jälste, ǩieʹzzi, leäi, lij, vuõʹlji
- Pot
- AUX: leežž
- VERB: leežž, Mõõnžiǩ, kooʹddže, poouǩeškueʹđež, ǩiʹcstež, aaudže, kâssneškueʹđež, piijže, piijžik, piijžiǩ
- Past
- AUX: leäi, leʹjjiǩ, feʹrttji, leʹjje, leʹjjem
- VERB: mõõni, puõʹđi, šõõddi, vaaʹldi, jälste, ǩieʹzzi, leäi, vuõʹlji, vuejai, lueʹšti
- Pres
- AUX: lij, liâ, õõk, õõlǥ, leäk, leäkku, leäm, iʹlla, iʹlleäk, jeäʹla
- VERB: ceälkk, lij, ǩiõzzam, ceäʹlǩǩe, jåått, mâânn, pohtt, räʹjje, vuâlgg, puätt
- Pass
- VERB-Part: čõnnum, tuåimtum
Pronouns, Determiners, Quantifiers
- Dem
- DET: tõn, tõt, Tät
- PRON: tõt, tõn, tõt-i, tõid, tuk, tut, dõõn, dõõt, tän, tät
- Int
- ADV: Koozz
- PRON: mâiʹd, Mii, ǩii
- Prs
- PRON: son, ton, mon, suu, muu, miʹjjid, suʹst, seeʹst, tuu, jiijj
- Rel
- ADV: mâʹst, mõõzz
- PRON: mii, kååʹtt, mõin, mõõn, ǩeäzz
- Tot
- PRON: puk
- Card
- NUM: kueʹhtt, õhtt, čiččâm, kooum, kooumâst, koummân, kuâhttlovitt, vitmlo
- Ord
- ADJ: kuälmad, nuuʹbb
- Yes
- PRON: jiijj, jiõčč, jiijjâs
- 1
- AUX: jiõm, jeäʹp, leäm, leʹjjem
- PRON: mon, muu, miʹjjid, mij, muännaid, muʹnne, muʹst
- VERB: ǩiõzzam, vääldam, čuõlmmääm, vuâlǥam, roʹttjam, vuõʹlǧǧem, Joordam, jieʹlim, jieʹllem, piâzzam
- 2
- AUX: jiõk, õõk, leäk, leʹjjiǩ, Jeäʹled, leäk-a
- PRON: ton, tuu, tij, ton-õs
- VERB: mõõnnâd, tiâr, mõõnsââʹst, puäʹđ, Mõõnžiǩ, puõʹttiǩ, Haaʹlääk-a, joordak, kooʹddid, kueʹst
- 3
- AUX: lij, ij, leäi, liâ, iʹlla, iʹlleäk, jeäʹla, jiâ, lij-a, Ij-ǥo
- PRON: son, suu, suʹst, seeʹst, sij, suännast, jiijjâs, suäna
- VERB: ceälkk, mõõni, puõʹđi, šõõddi, vaaʹldi, jälste, ǩieʹzzi, leäi, lij, vuõʹlji
- Plur
- NOUN: eeʹjjed, jällmõõžžâz, neävveez, vuõddjez
- Sing
- ADP: piirâs
- NOUN: eččad, jeäʹnnes, villjâs, vuäʹbbes, ääušas, triâŋgâs, Jeänn’jam, eččan, heäppšines, jeännam
Other Features
- AdpType
- Post
- ADP: ool, luzz, âʹlnn, årra, mieʹldd, räjja
- Prep
- ADP: pâʹjjel, Rââst, pirr, čõõđ
- Post
- AdvType
- Tim
- ADV: âʹtte, ååʹn, mâŋŋa, teä, eʹpet, kuuʹǩǩ, teʹl, vuõššân, Eiʹdde, jo
- SCONJ: Gu
- Tim
- Clitic
- AddI
- ADV: tok-i, nuʹt-i, pâi-i, eʹpet-i
- PRON: tõt-i, puk-i
- Han
- NOUN: aalmi-han
- Os
- PRON: ton-õs
- QstA
- ADV: olgglab-a, koozz-a
- AUX: lij-a, leäk-a
- NOUN: Šurr-a
- PRON: jiânnai-a
- VERB: Haaʹlääk-a, leäk-a, vueʹljžiǩ-a
- AddI
- Connegative
- Yes
- AUX: õõlǥ, leäkku
- VERB: kuâddam, kueʹst, piâzz, tieʹđ, šõõdd, cieʹlǩ, kaaun, kooǯǯtam, kååddče, kååʹdd
- Yes
- Derivation
- Dimin
- NOUN: äʹrbbaaušu
- InchL
- VERB: dõnnõǥškuätt, kâssneškueʹđež, lueʹšttleškuätt, lueʹštškueʹtted, säärntõlškuõʹđi, vuõddmâstõlškuõʹđi, šõllneškuõʹđež
- VERB-Inf: lueʹštškueʹtted
- Men
- VERB: vueʹtǩǩmen, håiddmen, jieʹllmen, jååʹttmen, viikkmen
- Dimin
- NameType
- Geo
- PROPN: Franskkjânnam, Pariizzâst
- Giv
- PROPN: Peter, Mary, Jane
- Prs
- PROPN: Laurikainen
- Sur
- PROPN: Brown, Smith
- Geo
- Person[psor]
- 1
- NOUN: Jeänn’jam, eččan, jeännam, vuäbbam
- 2
- NOUN: eččad, eeʹjjed, jeänn’jad, juâlǥad
- 3
- ADP: piirâs
- NOUN: jeäʹnnes, villjâs, vuäʹbbes, ääušas, triâŋgâs, heäppšines, jueʹljes, jällmõõžžâz, kuättses, kuõjâs
- 1
- Typo
- Yes
- ADV: nuʹt, nuʹt-i
- NOUN: Tuõddâr, villjâs, Čääʹʒʒid
- VERB: piijžik
- Yes
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: leeʹd, iʹlla.
- This corpus uses 6 lemmas as auxiliaries (aux). Examples: ij, leeʹd, õlggâd, i-ǥõl, feʹrttjed, iʹlla.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN-Acc (3)
- VERB--NOUN-Gen (1)
- VERB--NOUN-Ill (1)
- VERB--NOUN-Nom (99)
- VERB--NOUN-Par (1)
- VERB--PRON-Gen (1)
- VERB--PRON-Nom (123)
- VERB-Inf--NOUN-Nom (1)
- VERB-Inf--PRON-Nom (3)
- VERB-Part--NOUN-Nom (9)
- VERB-Part--PRON-Nom (11)
- obj
- VERB--NOUN-Acc (65)
- VERB--NOUN-Acc-ADP(rââst) (1)
- VERB--NOUN-Gen (2)
- VERB--NOUN-Ill (1)
- VERB--NOUN-Nom (7)
- VERB--PRON (7)
- VERB--PRON-Acc (25)
- VERB--PRON-Gen (1)
- VERB--PRON-Nom (2)
- VERB-Inf--NOUN-Acc (6)
- VERB-Inf--NOUN-Ill (1)
- VERB-Inf--PRON-Acc (4)
- VERB-Inf--PRON-Nom (1)
- VERB-Part--NOUN-Acc (9)
- VERB-Part--PRON (1)
- VERB-Part--PRON-Acc (3)
- VERB-Vnoun--NOUN-Gen (1)
Relations Overview
- This corpus uses 19 relation subtypes: acl:relcl, advcl:tcl, advmod:deg, advmod:eval, advmod:foc, advmod:lmod, advmod:mmod, advmod:neg, advmod:tmod, aux:nec, aux:tense, cc:preconj, flat:name, nmod:poss, nsubj:cop, nsubj:pass, obl:agent, obl:lmod, obl:tmod
- The following 1 main types are not used alone, they are always subtyped: flat
- The following 5 relation types are not used in this corpus at all: iobj, csubj, clf, compound, list