home edit page issue tracker

This page pertains to UD version 2.

Treebank Statistics: UD_Estonian-EWT: POS Tags: X

There are 62 X lemmas (1%), 124 X types (1%) and 185 X tokens (0%). Out of 17 observed tags, the rank of X is: 11 in number of lemmas, 12 in number of types and 16 in number of tokens.

The 10 most frequent X lemmas: _, to, enthusiasistic, no, offtopic, Nooot, da, know, nõu, offence

The 10 most frequent X types: to, 000, s, u, Enthusiasistic, ga, no, offtopic, NOOOT, a

The 10 most frequent ambiguous lemmas: _ (X 91, PUNCT 3), to (X 21, ADP 6, PART 2), no (INTJ 61, X 3, ADV 1, DET 1), offtopic (X 3, NOUN 1), nõu (NOUN 9, X 2), u (ADV 3, NOUN 2, X 2), I (ADJ 3, PRON 2, X 1), NB (PROPN 1, X 1), b (NOUN 7, X 1), imo (ADV 7, X 1)

The 10 most frequent ambiguous types: to (X 20, ADP 6, PART 2), 000 (X 13, NUM 1), u (X 4, NOUN 2, ADV 1), no (INTJ 33, X 2, ADV 1, DET 1), a (NOUN 16, ADV 9, CCONJ 4, DET 2, X 2, PROPN 1), olla (AUX 89, VERB 18, X 2, ADV 1), st (ADV 8, X 2), tehas (X 2, NOUN 1), - (PUNCT 325, X 1), 3 (NUM 57, ADJ 1, X 1)

Morphology

The form / lemma ratio of X is 2.000000 (the average of all parts of speech is 1.733702).

The 1st highest number of forms (64) was observed with the lemma “_”: +++, -, -1, -dega, -e, -ga, 000, 02, 3, 300, 472, AT, a, aastasele, aegaset, arvuti, de, desid, eestist, füüsikat, ga, gravitatsioonist, homme, hot.ee, itaaliast, karantiin, keemikud, kingades, konkurent, kord, korealane, koreas, kraadise, kõik, meeskonnal, mõistusele, n, ne, olla, osa, panek, parameeter, pealt, refereri, relva, s, sama, sele, seni, sest, sinane, st, sõbralik, tasandil, teadus, tehas, tehnoloogia, tulesid, täht, u, valdkonnas, versiooni, vähem, üks.

The 2nd highest number of forms (1) was observed with the lemma “**”: **.

The 3rd highest number of forms (1) was observed with the lemma “E”: E.

X occurs with 2 features: Foreign (58; 31% instances), Abbr (5; 3% instances)

X occurs with 2 feature-value pairs: Abbr=Yes, Foreign=Yes

X occurs with 3 feature combinations. The most frequent feature combination is _ (122 tokens). Examples: to, 000, s, ga, NOOOT, a, olla, st, tehas, u

Relations

X nodes are attached to their parents using 13 different relations: goeswith (91; 49% instances), flat:foreign (29; 16% instances), dep (19; 10% instances), parataxis (13; 7% instances), root (11; 6% instances), flat (8; 4% instances), appos (4; 2% instances), discourse (4; 2% instances), nsubj (2; 1% instances), ccomp (1; 1% instances), conj (1; 1% instances), nmod (1; 1% instances), nsubj:cop (1; 1% instances)

Parents of X nodes belong to 11 different parts of speech: NOUN (54; 29% instances), X (35; 19% instances), NUM (22; 12% instances), VERB (21; 11% instances), PROPN (16; 9% instances), ADV (11; 6% instances), (11; 6% instances), ADJ (9; 5% instances), PRON (4; 2% instances), CCONJ (1; 1% instances), DET (1; 1% instances)

151 (82%) X nodes are leaves.

13 (7%) X nodes have one child.

7 (4%) X nodes have two children.

14 (8%) X nodes have three or more children.

The highest child degree of a X node is 7.

Children of X nodes are attached using 11 different relations: punct (32; 38% instances), flat:foreign (29; 34% instances), flat (13; 15% instances), advmod (3; 4% instances), conj (2; 2% instances), advcl (1; 1% instances), cc (1; 1% instances), dep (1; 1% instances), mark (1; 1% instances), nmod (1; 1% instances), vocative (1; 1% instances)

Children of X nodes belong to 9 different parts of speech: X (35; 41% instances), PUNCT (32; 38% instances), NOUN (5; 6% instances), PROPN (5; 6% instances), ADV (3; 4% instances), VERB (2; 2% instances), ADJ (1; 1% instances), CCONJ (1; 1% instances), SCONJ (1; 1% instances)