home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: NOUN

There are 9039 NOUN lemmas (33%), 18229 NOUN types (34%) and 83173 NOUN tokens (25%). Out of 17 observed tags, the rank of NOUN is: 1 in number of lemmas, 1 in number of types and 1 in number of tokens.

The 10 most frequent NOUN lemmas: rok, strana, léta, cena, firma, doba, vláda, zákon, společnost, země

The 10 most frequent NOUN types: p, let, roku, korun, roce, Kč, r, strany, firmy, případě

The 10 most frequent ambiguous lemmas: bod (NOUN 338, PROPN 1), stát (VERB 344, NOUN 328), den (NOUN 272, X 1), místo (NOUN 222, ADP 45, ADV 6), a (CCONJ 7162, NOUN 17, X 6), teplo (NOUN 91, ADV 1), pravda (NOUN 69, PART 2), s (ADP 2504, NOUN 27, X 10, PART 6), růst (NOUN 60, VERB 26), x (NOUN 32, SYM 19)

The 10 most frequent ambiguous types: p (NOUN 163, ADJ 2), s (ADP 1960, NOUN 72, X 10, PART 6), a (CCONJ 6945, ADJ 32, NOUN 17, X 6), září (NOUN 102, VERB 2), j (NOUN 9, ADJ 1), bod (NOUN 87, PROPN 1), stát (NOUN 75, VERB 50), den (NOUN 70, X 1), místo (NOUN 69, ADP 34, ADV 4), x (NOUN 32, SYM 19)

Morphology

The form / lemma ratio of NOUN is 2.016705 (the average of all parts of speech is 1.961704).

The 1st highest number of forms (11) was observed with the lemma “strana”: s, str, stran, strana, stranami, stranou, stranu, strany, stranách, stranám, straně.

The 2nd highest number of forms (10) was observed with the lemma “hodina”: Hodina, h, hod, hodin, hodinami, hodinou, hodinu, hodiny, hodinách, hodině.

The 3rd highest number of forms (10) was observed with the lemma “ministr”: ministr, ministra, ministrem, ministrovi, ministru, ministry, ministrů, ministrům, ministře, ministři.

NOUN occurs with 9 features: Gender (79711; 96% instances), Case (78979; 95% instances), Number (78979; 95% instances), Animacy (34831; 42% instances), VerbForm (5750; 7% instances), Abbr (4056; 5% instances), Style (80; 0% instances), Typo (18; 0% instances), Foreign (1; 0% instances)

NOUN occurs with 23 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Foreign=Yes, Gender=Fem, Gender=Masc, Gender=Neut, Number=Dual, Number=Plur, Number=Sing, Style=Coll, Style=Expr, Style=Slng, Style=Vrnc, Typo=Yes, VerbForm=Vnoun

NOUN occurs with 130 feature combinations. The most frequent feature combination is Case=Gen|Gender=Fem|Number=Sing (6423 tokens). Examples: strany, práce, vlády, společnosti, firmy, republiky, rady, přímky, doby, obrany

Relations

NOUN nodes are attached to their parents using 27 different relations: nmod (27075; 33% instances), obl (14235; 17% instances), nsubj (13058; 16% instances), obj (9252; 11% instances), conj (5858; 7% instances), obl:arg (5217; 6% instances), root (2762; 3% instances), nsubj:pass (1419; 2% instances), appos (1018; 1% instances), dep (914; 1% instances), fixed (493; 1% instances), xcomp (470; 1% instances), advcl (344; 0% instances), orphan (298; 0% instances), ccomp (202; 0% instances), acl:relcl (154; 0% instances), case (139; 0% instances), acl (88; 0% instances), iobj (61; 0% instances), csubj (32; 0% instances), flat (29; 0% instances), parataxis (27; 0% instances), vocative (18; 0% instances), csubj:pass (5; 0% instances), advmod (3; 0% instances), amod (1; 0% instances), discourse (1; 0% instances)

Parents of NOUN nodes belong to 17 different parts of speech: VERB (35924; 43% instances), NOUN (33274; 40% instances), ADJ (6318; 8% instances), (2762; 3% instances), PROPN (1459; 2% instances), NUM (980; 1% instances), ADV (912; 1% instances), ADP (497; 1% instances), DET (355; 0% instances), PRON (218; 0% instances), AUX (155; 0% instances), SYM (128; 0% instances), X (116; 0% instances), PART (62; 0% instances), CCONJ (10; 0% instances), INTJ (2; 0% instances), SCONJ (1; 0% instances)

13297 (16%) NOUN nodes are leaves.

28956 (35%) NOUN nodes have one child.

24606 (30%) NOUN nodes have two children.

16314 (20%) NOUN nodes have three or more children.

The highest child degree of a NOUN node is 17.

Children of NOUN nodes are attached using 36 different relations: amod (33105; 24% instances), nmod (29721; 21% instances), case (25090; 18% instances), punct (9915; 7% instances), det (6573; 5% instances), conj (5678; 4% instances), cc (4480; 3% instances), nummod (3670; 3% instances), advmod:emph (3051; 2% instances), acl:relcl (2878; 2% instances), cop (2305; 2% instances), flat (2288; 2% instances), nsubj (1814; 1% instances), nummod:gov (1790; 1% instances), mark (1199; 1% instances), appos (1028; 1% instances), acl (982; 1% instances), dep (953; 1% instances), advmod (607; 0% instances), obl (498; 0% instances), xcomp (344; 0% instances), orphan (235; 0% instances), det:numgov (214; 0% instances), csubj (184; 0% instances), det:nummod (115; 0% instances), advcl (99; 0% instances), aux (84; 0% instances), parataxis (77; 0% instances), obl:arg (27; 0% instances), discourse (15; 0% instances), fixed (13; 0% instances), ccomp (10; 0% instances), obj (8; 0% instances), flat:foreign (6; 0% instances), vocative (2; 0% instances), expl:pv (1; 0% instances)

Children of NOUN nodes belong to 17 different parts of speech: ADJ (33662; 24% instances), NOUN (33274; 24% instances), ADP (24868; 18% instances), PUNCT (9915; 7% instances), DET (7582; 5% instances), PROPN (6881; 5% instances), NUM (5816; 4% instances), CCONJ (5203; 4% instances), VERB (3978; 3% instances), AUX (2444; 2% instances), ADV (2297; 2% instances), SCONJ (1214; 1% instances), PART (980; 1% instances), X (510; 0% instances), PRON (372; 0% instances), SYM (61; 0% instances), INTJ (2; 0% instances)