home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Old_East_Slavic-RNC: POS Tags: NOUN

There are 3533 NOUN lemmas (27%), 10095 NOUN types (32%) and 37787 NOUN tokens (22%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: государь, человѣкъ, князь, день, деревня, царь, земля, годъ, четь, грамота

The 10 most frequent NOUN types: день, г., чети, государь, людей, году, князя, князь, государю, весу

The 10 most frequent ambiguous lemmas: верхъ (NOUN 155, ADP 4), гора (NOUN 54, PROPN 1), патриархъ (NOUN 42, ADV 1), поганый (NOUN 36, ADJ 3), онъ (PRON 1369, NOUN 27), зло (NOUN 23, ADV 1), дѣловой (ADJ 32, NOUN 17), святой (ADJ 238, NOUN 17), кровь (NOUN 15, X 1), добро (NOUN 12, PART 2)

The 10 most frequent ambiguous types: села (NOUN 74, VERB 2), верхъ (NOUN 30, ADP 4), де (PART 423, NOUN 27), стану (NOUN 27, VERB 4), княже (NOUN 20, ADJ 1), межу (NOUN 20, ADP 9), с. (NOUN 15, DET 6, PROPN 1), трети (NOUN 15, ADJ 7), ели (NOUN 11, VERB 1), зла (NOUN 11, ADJ 2)

Morphology

The form / lemma ratio of NOUN is 2.857345 (the average of all parts of speech is 2.481645).

The 1st highest number of forms (61) was observed with the lemma “человѣкъ”: [людей], люд[и], людеи, людей, людем, людемъ, людемь, людех, людехъ, людеі, люди, людие, людими, людми, людмі, людьи, людьми, людям, людямъ, людяхъ, людєм, людіе, людѣй, людѣхъ, ч, ч(е)л(о)в(е)к, ч(е)л(о)в(е)ка, ч(е)л(о)в(е)ком, ч(е)л(о)в(е)ку, ч., ч[е]л[о]в[е]ка, ч[е]л[о]в[е]къ, ч[е]л[о]в[ѣ]къ, ч[е]л[овѣ]къ, челвка, человек, человека, человеком, человекомъ, человеку, человекъ, человекы, человекѣ, человецы, человѣка, человѣкамъ, человѣки, человѣкомъ, человѣку, человѣкъ, человѣкѣ, человѣцы, челѡвѣкъ, члавкꙋ, члвка, члвках, члвкъ, члвкꙋ, члкомъ, члкъ, члкꙋ.

The 2nd highest number of forms (55) was observed with the lemma “крестьянинъ”: кр(е)стьян, кр(е)стьяне, кр(е)стьянин, кр(е)стьяном, кр(е)стьяны, кр(ес)тьян, крести[я]н, крестия[ни]на, крестиян, крестиянин, крестиянина, крестиянином, крестиянину, крестияном, крестияны, крестияня, крестьанинѣ, крестьян, крестьянам, крестьянами, крестьяне, крестьянемъ, крестьянех, крестьянин, крестьянина, крестьянину, крестьянинъ, крестьянинѣ, крестьяннна, крестьяном, крестьяномъ, крестьянъ, крестьяны, крестьяня, крестьяням, крестьянѣхъ, крестьянѧ, крестьѧянина, крестьꙗнина, кресьянин, кресяны, кристьянъ, кристіянъ, кристіяны, крстьян, крстьяне, крстьянин, крстьянина, крстьяном, крстьянъ, крѣсяня, хрестьяны, християном, христьяне, христьяны.

The 3rd highest number of forms (46) was observed with the lemma “князь”: [князя, кн[ѧ]з[е]и, кн[ѧ]з[е]мъ, кн[ѧ]зе, кн[ѧ]зе[и], кн[ѧ]зеи, кн[ѧ]зем, кн[ѧ]земъ, кн[ѧ]зи, кн[ѧ]зь, кн[ѧ]зю, кн[ѧ]зѣ, кн[ѧ]зѧ, кнези, кнз, кнзь, кнзю, кнзя, кня[зь], княже, княжми, княз, князе, князеи, князей, князем, княземъ, князех, князи, князми, князь, князьям, князю, князя, князѣ, кнѕю, кнѕя, кнѧз[и]], кнѧз[ь], кнѧз[ѧ], кнѧзем, кнѧзю, кнѧзѧ, кн҃зꙗ, кнꙗз[ь], кънязю.

NOUN occurs with 7 features: Case (37229; 99% instances), Gender (37229; 99% instances), Number (37229; 99% instances), Animacy (982; 3% instances), Abbr (718; 2% instances), Typo (10; 0% instances), InflClass (4; 0% instances)

NOUN occurs with 18 feature-value pairs: Abbr=Yes, Animacy=Anim, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Masc, Gender=Neut, InflClass=Ind, Number=Count, Number=Dual, Number=Plur, Number=Sing, Typo=Yes

NOUN occurs with 98 feature combinations. The most frequent feature combination is Case=Gen|Gender=Masc|Number=Sing (3930 tokens). Examples: князя, весу, году, государя, царя, воза, монастыря, города, отца, ноября

Relations

NOUN nodes are attached to their parents using 36 different relations: obl (6988; 18% instances), conj (6634; 18% instances), nmod (6410; 17% instances), nsubj (4218; 11% instances), obj (3992; 11% instances), appos (2587; 7% instances), iobj (1415; 4% instances), root (1233; 3% instances), obl:tmod (1090; 3% instances), nsubj:pass (881; 2% instances), vocative (802; 2% instances), orphan (398; 1% instances), parataxis (320; 1% instances), flat:name (148; 0% instances), xcomp (110; 0% instances), acl:relcl (108; 0% instances), advcl (80; 0% instances), list (74; 0% instances), obl:agent (65; 0% instances), dislocated (60; 0% instances), ccomp (31; 0% instances), acl (28; 0% instances), compound (28; 0% instances), nummod:gov (21; 0% instances), flat (10; 0% instances), parataxis:discourse (10; 0% instances), nummod (9; 0% instances), fixed (8; 0% instances), nsubj:outer (7; 0% instances), amod (6; 0% instances), obl:depict (4; 0% instances), reparandum (4; 0% instances), csubj (3; 0% instances), case (2; 0% instances), dep (2; 0% instances), discourse (1; 0% instances)

Parents of NOUN nodes belong to 14 different parts of speech: VERB (17758; 47% instances), NOUN (14453; 38% instances), PROPN (1506; 4% instances), PRON (1344; 4% instances), (1233; 3% instances), ADJ (873; 2% instances), ADV (239; 1% instances), DET (194; 1% instances), NUM (103; 0% instances), AUX (50; 0% instances), PART (18; 0% instances), ADP (14; 0% instances), SYM (1; 0% instances), X (1; 0% instances)

5214 (14%) NOUN nodes are leaves.

11512 (30%) NOUN nodes have one child.

10638 (28%) NOUN nodes have two children.

10423 (28%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 24.

Children of NOUN nodes are attached using 43 different relations: case (13280; 18% instances), amod (11014; 15% instances), punct (9387; 13% instances), det (7582; 10% instances), nmod (6849; 9% instances), conj (6499; 9% instances), cc (5774; 8% instances), appos (4525; 6% instances), nummod:gov (2540; 3% instances), nsubj (1105; 2% instances), advmod (737; 1% instances), nummod (570; 1% instances), acl:relcl (520; 1% instances), orphan (463; 1% instances), acl (424; 1% instances), obl (398; 1% instances), cop (306; 0% instances), dep (245; 0% instances), parataxis (236; 0% instances), iobj (139; 0% instances), mark (127; 0% instances), discourse (86; 0% instances), advcl (76; 0% instances), compound (68; 0% instances), vocative (68; 0% instances), list (51; 0% instances), obj (24; 0% instances), flat:name (22; 0% instances), obl:tmod (21; 0% instances), aux (20; 0% instances), dislocated (16; 0% instances), ccomp (12; 0% instances), csubj (9; 0% instances), flat (8; 0% instances), obl:float (7; 0% instances), parataxis:discourse (7; 0% instances), expl (6; 0% instances), nsubj:pass (4; 0% instances), reparandum (4; 0% instances), xcomp (4; 0% instances), nsubj:outer (3; 0% instances), obl:depict (2; 0% instances), fixed (1; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: NOUN (14453; 20% instances), ADP (13209; 18% instances), ADJ (11275; 15% instances), PUNCT (9387; 13% instances), DET (6754; 9% instances), CCONJ (5770; 8% instances), PROPN (4232; 6% instances), NUM (3192; 4% instances), PRON (1821; 2% instances), VERB (1423; 2% instances), PART (556; 1% instances), ADV (377; 1% instances), AUX (330; 0% instances), X (253; 0% instances), SCONJ (186; 0% instances), INTJ (19; 0% instances), SYM (2; 0% instances)