home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-GSD: POS Tags: NUM

There are 676 NUM lemmas (4%), 723 NUM types (2%) and 2103 NUM tokens (2%). Out of 16 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 9 in number of tokens.

The 10 most frequent NUM lemmas: один, два, 2, несколько, три, 1, 10, четыре, 4, 3

The 10 most frequent NUM types: 2, два, 1, один, несколько, 10, двух, 4, три, 3

The 10 most frequent ambiguous lemmas: один (NUM 178, DET 8), 2 (NUM 68, ADJ 9), несколько (NUM 68, ADV 5), 1 (NUM 49, ADJ 33), 10 (NUM 43, ADJ 14), 4 (NUM 36, ADJ 14), 3 (NUM 31, ADJ 8), 5 (NUM 29, ADJ 9), много (NUM 29, ADV 9), 6 (NUM 28, ADJ 11)

The 10 most frequent ambiguous types: 2 (NUM 67, ADJ 9), 1 (NUM 49, ADJ 33), один (NUM 40, DET 3), несколько (NUM 41, ADV 5), 10 (NUM 43, ADJ 14), 4 (NUM 36, ADJ 14), 3 (NUM 31, ADJ 8), 5 (NUM 29, ADJ 9), 6 (NUM 28, ADJ 11), 20 (NUM 26, ADJ 12)

Morphology

The form / lemma ratio of NUM is 1.069527 (the average of all parts of speech is 1.598617).

The 1st highest number of forms (10) was observed with the lemma “один”: один, одна, одним, одних, одно, одного, одной, одном, одному, одну.

The 2nd highest number of forms (5) was observed with the lemma “два”: два, две, двум, двумя, двух.

The 3rd highest number of forms (5) was observed with the lemma “много”: более, больше, многим, многих, много.

NUM occurs with 6 features: Case (2025; 96% instances), NumType (2004; 95% instances), Animacy (1014; 48% instances), Gender (607; 29% instances), Number (328; 16% instances), Typo (1; 0% instances)

NUM occurs with 15 feature-value pairs: Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Gender=Fem, Gender=Masc, Gender=Neut, NumType=Card, Number=Plur, Number=Sing, Typo=Yes

NUM occurs with 90 feature combinations. The most frequent feature combination is Case=Nom|NumType=Card (464 tokens). Examples: 10, 5, 0, 16, 15, 20, 6, 11, 13, 12

Relations

NUM nodes are attached to their parents using 21 different relations: nummod:gov (882; 42% instances), nummod (552; 26% instances), nummod:entity (151; 7% instances), appos (77; 4% instances), nmod (76; 4% instances), root (65; 3% instances), conj (58; 3% instances), compound (49; 2% instances), obl (43; 2% instances), amod (28; 1% instances), list (27; 1% instances), nsubj (24; 1% instances), obj (23; 1% instances), parataxis (21; 1% instances), xcomp (15; 1% instances), nsubj:pass (5; 0% instances), iobj (2; 0% instances), orphan (2; 0% instances), acl (1; 0% instances), ccomp (1; 0% instances), flat:foreign (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (1526; 73% instances), NUM (130; 6% instances), VERB (117; 6% instances), SYM (82; 4% instances), PROPN (78; 4% instances), (65; 3% instances), X (55; 3% instances), ADJ (40; 2% instances), PRON (3; 0% instances), ADP (2; 0% instances), ADV (2; 0% instances), PART (2; 0% instances), DET (1; 0% instances)

1520 (72%) NUM nodes are leaves.

383 (18%) NUM nodes have one child.

79 (4%) NUM nodes have two children.

121 (6%) NUM nodes have three or more children.

The highest child degree of a NUM node is 7.

Children of NUM nodes are attached using 24 different relations: punct (391; 39% instances), nmod (179; 18% instances), advmod (89; 9% instances), case (83; 8% instances), nsubj (68; 7% instances), conj (49; 5% instances), cc (29; 3% instances), list (20; 2% instances), appos (19; 2% instances), compound (11; 1% instances), parataxis (10; 1% instances), nummod (9; 1% instances), orphan (9; 1% instances), cop (8; 1% instances), nummod:gov (3; 0% instances), amod (2; 0% instances), dep (2; 0% instances), det (2; 0% instances), nummod:entity (2; 0% instances), acl (1; 0% instances), advcl (1; 0% instances), flat (1; 0% instances), goeswith (1; 0% instances), obl (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (391; 39% instances), NOUN (206; 21% instances), NUM (130; 13% instances), ADP (74; 7% instances), ADV (62; 6% instances), PART (28; 3% instances), CCONJ (25; 3% instances), SYM (16; 2% instances), PRON (10; 1% instances), PROPN (10; 1% instances), ADJ (9; 1% instances), VERB (9; 1% instances), AUX (8; 1% instances), X (6; 1% instances), DET (5; 1% instances), SCONJ (1; 0% instances)