home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Romanian-SiMoNERo: POS Tags: NUM

There are 915 NUM lemmas (8%), 918 NUM types (5%) and 4605 NUM tokens (3%). Out of 16 observed tags, the rank of NUM is: 3 in number of lemmas, 4 in number of types and 10 in number of tokens.

The 10 most frequent NUM lemmas: 2, 1, doi, 3, 4, 5, 30, 10, 20, 6

The 10 most frequent NUM types: 2, 1, două, 3, 4, 5, 30, 10, 20, 6

The 10 most frequent ambiguous lemmas: primul (NUM 50, ADJ 1), i (NOUN 4, NUM 2), ultimul (NUM 10, ADJ 2), V (NUM 6, NOUN 5), l (NOUN 20, NUM 3), milion (NUM 3, NOUN 1), 43160 (NOUN 1, NUM 1), unul (PRON 66, NUM 1)

The 10 most frequent ambiguous types: primul (NUM 44, ADJ 1), i (PRON 11, NOUN 4, NUM 2, AUX 1), ultimul (NUM 9, ADJ 2), V (NUM 6, NOUN 5), nouă (ADJ 8, NUM 4, PRON 1), l (NOUN 21, PRON 6, NUM 3), milioane (NUM 3, NOUN 1), 43160 (NOUN 1, NUM 1), unul (PRON 33, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.003279 (the average of all parts of speech is 1.666637).

The 1st highest number of forms (2) was observed with the lemma “5”: 5, 5-.

The 2nd highest number of forms (2) was observed with the lemma “I”: I, I-.

The 3rd highest number of forms (2) was observed with the lemma “doi”: doi, două.

NUM occurs with 7 features: NumType (4603; 100% instances), Number (4603; 100% instances), NumForm (4568; 99% instances), Gender (416; 9% instances), Case (237; 5% instances), Definite (208; 5% instances), PronType (35; 1% instances)

NUM occurs with 17 feature-value pairs: Case=Acc,Nom, Case=Gen, Case=Nom, Definite=Def, Definite=Ind, Gender=Fem, Gender=Masc, NumForm=Combi, NumForm=Digit, NumForm=Roman, NumForm=Word, NumType=Card, NumType=Frac, NumType=Ord, Number=Plur, Number=Sing, PronType=Tot

NUM occurs with 27 feature combinations. The most frequent feature combination is Number=Sing|NumForm=Digit|NumType=Card (3663 tokens). Examples: 2, 1, 3, 4, 5, 30, 10, 20, 6, 15

Relations

NUM nodes are attached to their parents using 19 different relations: nummod (3155; 69% instances), parataxis (808; 18% instances), conj (406; 9% instances), nsubj (93; 2% instances), root (36; 1% instances), appos (20; 0% instances), fixed (16; 0% instances), nsubj:pass (13; 0% instances), nmod (10; 0% instances), obj (9; 0% instances), obl (8; 0% instances), acl (5; 0% instances), amod (5; 0% instances), compound (5; 0% instances), xcomp (5; 0% instances), advcl (3; 0% instances), ccomp (3; 0% instances), flat (3; 0% instances), obl:pmod (2; 0% instances)

Parents of NUM nodes belong to 13 different parts of speech: NOUN (2900; 63% instances), VERB (809; 18% instances), NUM (535; 12% instances), ADJ (164; 4% instances), PROPN (42; 1% instances), ADV (36; 1% instances), (36; 1% instances), X (34; 1% instances), ADP (32; 1% instances), PRON (8; 0% instances), AUX (6; 0% instances), DET (2; 0% instances), CCONJ (1; 0% instances)

1906 (41%) NUM nodes are leaves.

1086 (24%) NUM nodes have one child.

1194 (26%) NUM nodes have two children.

419 (9%) NUM nodes have three or more children.

The highest child degree of a NUM node is 9.

Children of NUM nodes are attached using 21 different relations: punct (2722; 54% instances), case (872; 17% instances), conj (399; 8% instances), nmod (255; 5% instances), advmod (217; 4% instances), cc (150; 3% instances), nummod (143; 3% instances), det (93; 2% instances), cop (52; 1% instances), nsubj (48; 1% instances), parataxis (27; 1% instances), aux (18; 0% instances), appos (11; 0% instances), advcl (9; 0% instances), acl (8; 0% instances), compound (8; 0% instances), mark (6; 0% instances), obl (4; 0% instances), amod (3; 0% instances), cc:preconj (1; 0% instances), flat (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: PUNCT (2722; 54% instances), ADP (868; 17% instances), NUM (535; 11% instances), NOUN (308; 6% instances), ADV (201; 4% instances), CCONJ (146; 3% instances), DET (110; 2% instances), AUX (70; 1% instances), PRON (36; 1% instances), VERB (22; 0% instances), SCONJ (11; 0% instances), ADJ (8; 0% instances), X (5; 0% instances), PROPN (3; 0% instances), PART (2; 0% instances)