home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Czech-PDT: POS Tags: NUM

There are 1238 NUM lemmas (5%), 1324 NUM types (2%) and 9258 NUM tokens (3%). Out of 17 observed tags, the rank of NUM is: 6 in number of lemmas, 6 in number of types and 12 in number of tokens.

The 10 most frequent NUM lemmas: jeden, dva, 1, tři, tisíc, 2, oba, 3, milión, čtyři

The 10 most frequent NUM types: 1, 2, 3, tisíc, tři, dva, dvě, 4, 10, jeden

The 10 most frequent ambiguous lemmas: I (NUM 20, NOUN 13, X 2), V (NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: tří (NUM 44, ADJ 1), jednou (ADV 32, NUM 32), mil (NUM 27, NOUN 2), I (CCONJ 91, NUM 20, NOUN 13, X 2), set (NUM 20, X 4, ADJ 1, NOUN 1), V (ADP 797, NOUN 49, NUM 5), XX (NOUN 1, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.069467 (the average of all parts of speech is 1.961704).

The 1st highest number of forms (10) was observed with the lemma “jeden”: jeden, jedna, jedno, jednoho, jednom, jednomu, jednou, jednu, jedné, jedním.

The 2nd highest number of forms (8) was observed with the lemma “miliarda”: miliard, miliarda, miliardami, miliardou, miliardu, miliardy, miliardách, mld.

The 3rd highest number of forms (7) was observed with the lemma “tisíc”: tis, tisíc, tisíce, tisíci, tisících, tisíců, tisícům.

NUM occurs with 7 features: NumType (9258; 100% instances), NumForm (8530; 92% instances), Case (2903; 31% instances), Number (2903; 31% instances), Gender (1691; 18% instances), Animacy (586; 6% instances), Abbr (45; 0% instances)

NUM occurs with 24 feature-value pairs: Abbr=Yes, Animacy=Anim, Animacy=Inan, Case=Acc, Case=Dat, Case=Gen, Case=Ins, Case=Loc, Case=Nom, Case=Voc, Gender=Fem, Gender=Fem,Neut, Gender=Masc, Gender=Masc,Neut, Gender=Neut, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumType=Sets, Number=Dual, Number=Plur, Number=Sing

NUM occurs with 70 feature combinations. The most frequent feature combination is NumForm=Digit|NumType=Card (6033 tokens). Examples: 1, 2, 3, 4, 10, 5, 1992, 6, 1993, 15

Relations

NUM nodes are attached to their parents using 25 different relations: nummod (4024; 43% instances), nummod:gov (1846; 20% instances), conj (812; 9% instances), compound (753; 8% instances), dep (446; 5% instances), obl (437; 5% instances), root (272; 3% instances), obj (203; 2% instances), nsubj (152; 2% instances), orphan (81; 1% instances), obl:arg (77; 1% instances), appos (46; 0% instances), nmod (30; 0% instances), nsubj:pass (24; 0% instances), xcomp (17; 0% instances), advcl (13; 0% instances), flat (11; 0% instances), ccomp (5; 0% instances), acl (2; 0% instances), acl:relcl (2; 0% instances), csubj (1; 0% instances), csubj:pass (1; 0% instances), iobj (1; 0% instances), mark (1; 0% instances), vocative (1; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (5816; 63% instances), NUM (1493; 16% instances), VERB (711; 8% instances), PROPN (378; 4% instances), (272; 3% instances), ADJ (200; 2% instances), SYM (119; 1% instances), ADV (98; 1% instances), X (87; 1% instances), PRON (43; 0% instances), AUX (20; 0% instances), DET (19; 0% instances), CCONJ (2; 0% instances)

4812 (52%) NUM nodes are leaves.

3041 (33%) NUM nodes have one child.

845 (9%) NUM nodes have two children.

560 (6%) NUM nodes have three or more children.

The highest child degree of a NUM node is 27.

Children of NUM nodes are attached using 30 different relations: punct (2255; 33% instances), nmod (931; 14% instances), conj (802; 12% instances), compound (753; 11% instances), case (513; 7% instances), advmod:emph (453; 7% instances), cc (286; 4% instances), dep (196; 3% instances), amod (124; 2% instances), mark (96; 1% instances), cop (95; 1% instances), nsubj (83; 1% instances), advmod (70; 1% instances), appos (55; 1% instances), orphan (41; 1% instances), obl (25; 0% instances), flat (23; 0% instances), acl:relcl (19; 0% instances), det (15; 0% instances), parataxis (10; 0% instances), xcomp (9; 0% instances), advcl (8; 0% instances), csubj (5; 0% instances), det:nummod (5; 0% instances), obj (3; 0% instances), obl:arg (3; 0% instances), acl (2; 0% instances), aux (2; 0% instances), discourse (2; 0% instances), fixed (1; 0% instances)

Children of NUM nodes belong to 16 different parts of speech: PUNCT (2255; 33% instances), NUM (1493; 22% instances), NOUN (980; 14% instances), ADP (511; 7% instances), ADV (340; 5% instances), CCONJ (281; 4% instances), SYM (232; 3% instances), PART (188; 3% instances), ADJ (173; 3% instances), AUX (97; 1% instances), SCONJ (97; 1% instances), PROPN (79; 1% instances), VERB (57; 1% instances), DET (45; 1% instances), PRON (35; 1% instances), X (22; 0% instances)