home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Russian-GSD: POS Tags: NOUN

There are 6103 NOUN lemmas (32%), 11249 NOUN types (37%) and 26814 NOUN tokens (27%). Out of 16 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: год, время, человек, город, часть, район, область, состав, население, река

The 10 most frequent NOUN types: года, году, время, области, лет, человек, войны, реки, год, км

The 10 most frequent ambiguous lemmas: союз (NOUN 25, PROPN 1), м (NOUN 24, ADV 1, X 1), no (NOUN 21, X 1), тысяча (NOUN 21, NUM 14), ученый (NOUN 19, ADJ 3), б (NOUN 11, ADJ 1), другой (ADJ 103, NOUN 11), русский (ADJ 57, NOUN 11), а (CCONJ 275, NOUN 10, X 3), x (NOUN 6, ADJ 2, X 1)

The 10 most frequent ambiguous types: мм (NOUN 27, ADJ 1), имени (NOUN 22, ADP 4), No (NOUN 21, X 1), дома (NOUN 13, ADV 3), основном (NOUN 16, ADJ 1), начала (NOUN 12, VERB 5), б (NOUN 6, ADJ 1), типа (ADP 14, NOUN 11), а (CCONJ 261, X 3, NOUN 1), начало (NOUN 9, VERB 1)

Morphology

The form / lemma ratio of NOUN is 1.843192 (the average of all parts of speech is 1.598617).

The 1st highest number of forms (12) was observed with the lemma “год”: г., гг., год, года, годам, годами, годах, годов, годом, году, годы, лет.

The 2nd highest number of forms (11) was observed with the lemma “человек”: людей, люди, людьми, людям, чел, чел., человек, человека, человеке, человеком, человеку.

The 3rd highest number of forms (10) was observed with the lemma “актер”: актер, актера, актеров, актеры, актёр, актёра, актёрами, актёров, актёром, актёры.

NOUN occurs with 7 features: Case (26755; 100% instances), Number (26755; 100% instances), Animacy (26754; 100% instances), Gender (26754; 100% instances), Abbr (6; 0% instances), Foreign (3; 0% instances), Typo (2; 0% instances)

NOUN occurs with 18 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Par, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Plur, Number=Sing, Typo=Yes

NOUN occurs with 78 feature combinations. The most frequent feature combination is Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing (2968 tokens). Examples: года, города, века, мира, декабря, района, сентября, января, марта, июня

Relations

NOUN nodes are attached to their parents using 30 different relations: nmod (9082; 34% instances), obl (5744; 21% instances), nsubj (3132; 12% instances), obj (2451; 9% instances), conj (2263; 8% instances), appos (968; 4% instances), root (753; 3% instances), nsubj:pass (557; 2% instances), iobj (448; 2% instances), flat (373; 1% instances), xcomp (275; 1% instances), obl:agent (183; 1% instances), parataxis (171; 1% instances), fixed (115; 0% instances), orphan (80; 0% instances), list (51; 0% instances), nummod:gov (45; 0% instances), acl:relcl (23; 0% instances), ccomp (23; 0% instances), flat:foreign (18; 0% instances), compound (17; 0% instances), acl (16; 0% instances), advcl (7; 0% instances), amod (7; 0% instances), nummod (4; 0% instances), dep (2; 0% instances), flat:name (2; 0% instances), vocative (2; 0% instances), case (1; 0% instances), dislocated (1; 0% instances)

Parents of NOUN nodes belong to 13 different parts of speech: VERB (12029; 45% instances), NOUN (11972; 45% instances), ADJ (1060; 4% instances), (753; 3% instances), PROPN (415; 2% instances), NUM (206; 1% instances), ADV (106; 0% instances), ADP (99; 0% instances), X (67; 0% instances), SYM (47; 0% instances), PRON (32; 0% instances), DET (26; 0% instances), PART (2; 0% instances)

3732 (14%) NOUN nodes are leaves.

9122 (34%) NOUN nodes have one child.

8328 (31%) NOUN nodes have two children.

5632 (21%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 30.

Children of NOUN nodes are attached using 38 different relations: nmod (10548; 22% instances), amod (9942; 21% instances), case (8579; 18% instances), punct (5916; 13% instances), appos (2412; 5% instances), conj (2196; 5% instances), cc (1297; 3% instances), det (1292; 3% instances), acl (995; 2% instances), nummod:gov (857; 2% instances), nsubj (664; 1% instances), acl:relcl (506; 1% instances), nummod (486; 1% instances), advmod (381; 1% instances), parataxis (223; 0% instances), cop (119; 0% instances), orphan (80; 0% instances), compound (76; 0% instances), nummod:entity (62; 0% instances), obl (47; 0% instances), mark (39; 0% instances), list (36; 0% instances), advcl (33; 0% instances), iobj (31; 0% instances), expl (23; 0% instances), fixed (20; 0% instances), dep (17; 0% instances), obl:agent (14; 0% instances), ccomp (12; 0% instances), flat:foreign (8; 0% instances), aux:pass (2; 0% instances), flat:name (2; 0% instances), goeswith (2; 0% instances), aux (1; 0% instances), discourse (1; 0% instances), obj (1; 0% instances), vocative (1; 0% instances), xcomp (1; 0% instances)

Children of NOUN nodes belong to 16 different parts of speech: NOUN (11972; 26% instances), ADJ (10251; 22% instances), ADP (8514; 18% instances), PUNCT (5916; 13% instances), PROPN (3007; 6% instances), VERB (1660; 4% instances), NUM (1526; 3% instances), DET (1396; 3% instances), CCONJ (1271; 3% instances), X (452; 1% instances), ADV (289; 1% instances), PRON (214; 0% instances), PART (209; 0% instances), AUX (122; 0% instances), SYM (64; 0% instances), SCONJ (59; 0% instances)