home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Portuguese-DANTEStocks: POS Tags: NUM

There are 2375 NUM lemmas (26%), 2379 NUM types (21%) and 5021 NUM tokens (6%). Out of 16 observed tags, the rank of NUM is: 1 in number of lemmas, 1 in number of types and 7 in number of tokens.

The 10 most frequent NUM lemmas: 13, 5, 3, 10, 15, um, 2, 1, 4, 31/12/2013

The 10 most frequent NUM types: 13, 5, 3, 10, 15, 2, 1, 4, 31/12/2013, 6

The 10 most frequent ambiguous lemmas: um (DET 381, NUM 45, PRON 11, NOUN 1), 4 (NUM 42, PROPN 2), 12/2013 (NUM 3, NOUN 1), 26/3/2014 (NUM 3, SYM 1), 1/2 (NOUN 1, NUM 1), 28/03/2014 (NOUN 1, NUM 1)

The 10 most frequent ambiguous types: 4 (NUM 42, PROPN 2), 6 (NUM 39, X 2), um (DET 182, NUM 26, PRON 4), uma (DET 180, NUM 14, PRON 6), 12/2013 (NUM 3, NOUN 1), 26/3/2014 (NUM 3, SYM 1), 64 (NUM 2, X 1), 1/2 (NOUN 1, NUM 1), 28/03/2014 (NOUN 1, NUM 1)

Morphology

The form / lemma ratio of NUM is 1.001684 (the average of all parts of speech is 1.238049).

The 1st highest number of forms (2) was observed with the lemma “cinco”: cinco, cindo.

The 2nd highest number of forms (2) was observed with the lemma “dois”: dois, duas.

The 3rd highest number of forms (2) was observed with the lemma “um”: um, uma.

NUM occurs with 4 features: NumType (5018; 100% instances), Gender (61; 1% instances), Typo (3; 0% instances), Number (1; 0% instances)

NUM occurs with 5 feature-value pairs: Gender=Fem, Gender=Masc, NumType=Card, Number=Sing, Typo=Yes

NUM occurs with 6 feature combinations. The most frequent feature combination is NumType=Card (4955 tokens). Examples: 13, 5, 3, 10, 15, 2, 1, 4, 31/12/2013, 6

Relations

NUM nodes are attached to their parents using 21 different relations: nummod (2290; 46% instances), nmod (1826; 36% instances), obl (326; 6% instances), conj (325; 6% instances), parataxis (71; 1% instances), obj (52; 1% instances), appos (29; 1% instances), list (24; 0% instances), nsubj (22; 0% instances), root (19; 0% instances), flat (14; 0% instances), advcl (4; 0% instances), acl (3; 0% instances), flat:name (3; 0% instances), reparandum (3; 0% instances), ccomp (2; 0% instances), ccomp:speech (2; 0% instances), discourse (2; 0% instances), nsubj:pass (2; 0% instances), acl:relcl (1; 0% instances), orphan (1; 0% instances)

Parents of NUM nodes belong to 12 different parts of speech: SYM (1691; 34% instances), NOUN (1381; 28% instances), PROPN (1174; 23% instances), VERB (376; 7% instances), NUM (296; 6% instances), ADV (45; 1% instances), X (21; 0% instances), (19; 0% instances), ADJ (12; 0% instances), PRON (4; 0% instances), AUX (1; 0% instances), INTJ (1; 0% instances)

3023 (60%) NUM nodes are leaves.

1744 (35%) NUM nodes have one child.

166 (3%) NUM nodes have two children.

88 (2%) NUM nodes have three or more children.

The highest child degree of a NUM node is 8.

Children of NUM nodes are attached using 25 different relations: advmod (719; 30% instances), case (556; 23% instances), punct (327; 13% instances), conj (269; 11% instances), cc (226; 9% instances), nmod (78; 3% instances), det (74; 3% instances), nsubj (37; 2% instances), cop (36; 1% instances), parataxis (35; 1% instances), list (23; 1% instances), flat (14; 1% instances), mark (6; 0% instances), appos (5; 0% instances), vocative (5; 0% instances), acl (4; 0% instances), acl:relcl (4; 0% instances), reparandum (4; 0% instances), amod (3; 0% instances), advcl (2; 0% instances), discourse (2; 0% instances), dep (1; 0% instances), goeswith (1; 0% instances), nummod (1; 0% instances), obl (1; 0% instances)

Children of NUM nodes belong to 15 different parts of speech: SYM (592; 24% instances), ADP (559; 23% instances), PUNCT (327; 13% instances), NUM (296; 12% instances), CCONJ (223; 9% instances), ADV (147; 6% instances), NOUN (88; 4% instances), DET (74; 3% instances), PROPN (40; 2% instances), AUX (36; 1% instances), VERB (27; 1% instances), X (9; 0% instances), PRON (6; 0% instances), SCONJ (6; 0% instances), ADJ (3; 0% instances)