home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Karelian-KKPP: POS Tags: NOUN

There are 359 NOUN lemmas (38%), 551 NOUN types (39%) and 839 NOUN tokens (27%). Out of 14 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: ihmini, lapši, mua, poika, kulttuuri, muamo, peli, aktijo, roveh, tunti

The 10 most frequent NOUN types: muamo, muan, poika, tunti, ihmisie, vuotena, kulttuurien, lapšien, ropehet, aktijo

The 10 most frequent ambiguous lemmas: ruado (NOUN 4, VERB 1), työ (NOUN 2, PRON 2), Kalevala-seikkailu#peli (NOUN 1, PROPN 1), juuri (ADV 1, NOUN 1), šilta (ADV 1, NOUN 1)

The 10 most frequent ambiguous types: šeikkailupelie (NOUN 1, X 1)

Morphology

The form / lemma ratio of NOUN is 1.534819 (the average of all parts of speech is 1.495298).

The 1st highest number of forms (8) was observed with the lemma “ihmini”: ihmini, ihmiseh, ihmisen, ihmiset, ihmisie, ihmisien, ihmisillä, ihmisissä.

The 2nd highest number of forms (7) was observed with the lemma “pereh”: Pereh, perehellä, perehen, perehenä, perehie, perehillä, perehtä.

The 3rd highest number of forms (7) was observed with the lemma “seikkailu#peli”: Šeikkailupelit, šeikkailupeli, šeikkailupelie, šeikkailupelih, šeikkailupelijä, šeikkailupelilöistä, šeikkailupelissä.

NOUN occurs with 5 features: Case (837; 100% instances), Number (837; 100% instances), Person[psor] (4; 0% instances), Abbr (2; 0% instances), Number[psor] (2; 0% instances)

NOUN occurs with 19 feature-value pairs: Abbr=Yes, Case=Abe, Case=Abl, Case=Ade, Case=Com, Case=Ela, Case=Ess, Case=Gen, Case=Ill, Case=Ine, Case=Ins, Case=Nom, Case=Par, Case=Tra, Number=Plur, Number=Sing, Number[psor]=Sing, Person[psor]=2, Person[psor]=3

NOUN occurs with 25 feature combinations. The most frequent feature combination is Case=Gen|Number=Sing (146 tokens). Examples: muan, karjalan, muajilman, pelin, pojan, -projektin, ihmisen, järještön, keškukšen, luonnon

Relations

NOUN nodes are attached to their parents using 21 different relations: obl (238; 28% instances), obj (173; 21% instances), nmod:poss (117; 14% instances), conj (93; 11% instances), nsubj (85; 10% instances), root (26; 3% instances), nmod (20; 2% instances), nsubj:cop (20; 2% instances), compound (19; 2% instances), flat:name (13; 2% instances), parataxis (10; 1% instances), appos (8; 1% instances), case (5; 1% instances), advcl (4; 0% instances), orphan (2; 0% instances), acl:relcl (1; 0% instances), amod (1; 0% instances), discourse (1; 0% instances), flat (1; 0% instances), nmod:gsubj (1; 0% instances), xcomp (1; 0% instances)

Parents of NOUN nodes belong to 10 different parts of speech: VERB (460; 55% instances), NOUN (260; 31% instances), (26; 3% instances), PROPN (23; 3% instances), ADJ (21; 3% instances), AUX (18; 2% instances), PRON (16; 2% instances), ADV (7; 1% instances), ADP (4; 0% instances), NUM (4; 0% instances)

302 (36%) NOUN nodes are leaves.

340 (41%) NOUN nodes have one child.

112 (13%) NOUN nodes have two children.

85 (10%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 9.

Children of NOUN nodes are attached using 28 different relations: nmod:poss (203; 22% instances), amod (153; 17% instances), punct (114; 13% instances), conj (106; 12% instances), cc (54; 6% instances), nummod (33; 4% instances), det (32; 4% instances), case (29; 3% instances), nmod (28; 3% instances), cop (27; 3% instances), compound (19; 2% instances), acl:relcl (17; 2% instances), advmod (16; 2% instances), nsubj:cop (16; 2% instances), appos (13; 1% instances), flat:name (13; 1% instances), obl (13; 1% instances), obj (5; 1% instances), mark (4; 0% instances), acl (3; 0% instances), xcomp (3; 0% instances), aux (2; 0% instances), ccomp (2; 0% instances), parataxis (2; 0% instances), advcl (1; 0% instances), cop:own (1; 0% instances), nsubj (1; 0% instances), vocative (1; 0% instances)

Children of NOUN nodes belong to 13 different parts of speech: NOUN (260; 29% instances), ADJ (163; 18% instances), PUNCT (114; 13% instances), PRON (86; 9% instances), PROPN (86; 9% instances), CCONJ (54; 6% instances), NUM (36; 4% instances), AUX (32; 4% instances), ADP (29; 3% instances), VERB (29; 3% instances), ADV (16; 2% instances), SCONJ (3; 0% instances), X (3; 0% instances)