Treebank Statistics: UD_Belarusian-HSE: POS Tags: X
There are 4885 X
lemmas (16%), 4933 X
types (9%) and 11432 X
tokens (4%).
Out of 17 observed tags, the rank of X
is: 2 in number of lemmas, 5 in number of types and 7 in number of tokens.
The 10 most frequent X
lemmas: </a>, , , a, href=, <, BelarusDocs, live, Akute, </em>
The 10 most frequent X
types: </a>, , , a, href=, <, BelarusDocs, live, ,
The 10 most frequent ambiguous lemmas: </a> (X 2126, SYM 83), </em> (X 696, SYM 7), </strong> (X 694, SYM 5), </em> (X 53, SYM 1), <a_href=”tut.by”> (X 2, SYM 1), Facebook (X 24, NOUN 1), art (X 16, NOUN 4), TV (X 16, NOUN 5), а (CCONJ 1177, ADP 190, X 11, NOUN 2, INTJ 1), :B:N: (X 14, PROPN 1)
The 10 most frequent ambiguous types: </a> (X 2126, SYM 83), </em> (X 696, SYM 7), </strong> (X 694, SYM 5), </em> (X 53, SYM 1), <a_href=”tut.by”> (X 2, SYM 1), TV (X 16, NOUN 5), а (CCONJ 731, ADP 173, X 9, NOUN 1), :B:N: (X 15, PROPN 1), ART (X 15, NOUN 4), TNT (X 12, PROPN 1)
- </a>
- </em>
- </strong>
- </em>
- <a_href=”tut.by”>
- TV
- а
- :B:N:
- ART
- TNT
Morphology
The form / lemma ratio of X
is 1.009826 (the average of all parts of speech is 1.756638).
The 1st highest number of forms (3) was observed with the lemma “ка”: ка, ку, кі.
The 2nd highest number of forms (3) was observed with the lemma “русский”: русский, русского, русском.
The 3rd highest number of forms (3) was observed with the lemma “у”: y, Ў, У.
X
occurs with 1 features: Foreign (3352; 29% instances)
X
occurs with 1 feature-value pairs: Foreign=Yes
X
occurs with 2 feature combinations.
The most frequent feature combination is _
(8080 tokens).
Examples: </a>, , , href=, <, , , tut.by, <a_href=”tut.by”>, <a_href=”symbal.by”>
Relations
X
nodes are attached to their parents using 25 different relations: dep (5727; 50% instances), parataxis (1460; 13% instances), appos (1038; 9% instances), flat:foreign (1019; 9% instances), root (533; 5% instances), nmod (381; 3% instances), conj (356; 3% instances), list (312; 3% instances), obl (224; 2% instances), nsubj (198; 2% instances), obj (64; 1% instances), flat:name (39; 0% instances), flat (19; 0% instances), compound (17; 0% instances), iobj (15; 0% instances), nsubj:pass (7; 0% instances), amod (5; 0% instances), case (5; 0% instances), acl (3; 0% instances), xcomp (3; 0% instances), goeswith (2; 0% instances), orphan (2; 0% instances), advcl (1; 0% instances), dislocated (1; 0% instances), fixed (1; 0% instances)
Parents of X
nodes belong to 16 different parts of speech: X (5510; 48% instances), NOUN (2370; 21% instances), VERB (1950; 17% instances), (533; 5% instances), PROPN (435; 4% instances), ADV (201; 2% instances), ADJ (184; 2% instances), NUM (139; 1% instances), SYM (62; 1% instances), PRON (26; 0% instances), DET (14; 0% instances), AUX (3; 0% instances), INTJ (2; 0% instances), ADP (1; 0% instances), PART (1; 0% instances), PUNCT (1; 0% instances)
7474 (65%) X
nodes are leaves.
854 (7%) X
nodes have one child.
1145 (10%) X
nodes have two children.
1959 (17%) X
nodes have three or more children.
The highest child degree of a X
node is 16.
Children of X
nodes are attached using 31 different relations: dep (4082; 39% instances), punct (3286; 31% instances), flat:foreign (969; 9% instances), case (386; 4% instances), conj (366; 3% instances), list (355; 3% instances), parataxis (271; 3% instances), appos (213; 2% instances), nmod (155; 1% instances), amod (151; 1% instances), cc (109; 1% instances), flat:name (44; 0% instances), nsubj (37; 0% instances), advmod (35; 0% instances), det (28; 0% instances), flat (20; 0% instances), acl (13; 0% instances), compound (12; 0% instances), acl:relcl (11; 0% instances), discourse (9; 0% instances), advcl (3; 0% instances), csubj (3; 0% instances), nummod (3; 0% instances), nummod:gov (3; 0% instances), obl (3; 0% instances), expl (2; 0% instances), iobj (2; 0% instances), mark (2; 0% instances), orphan (2; 0% instances), aux (1; 0% instances), xcomp (1; 0% instances)
Children of X
nodes belong to 16 different parts of speech: X (5510; 52% instances), PUNCT (3286; 31% instances), SYM (396; 4% instances), ADP (379; 4% instances), NOUN (266; 3% instances), ADJ (206; 2% instances), NUM (129; 1% instances), PROPN (125; 1% instances), CCONJ (106; 1% instances), VERB (67; 1% instances), DET (42; 0% instances), ADV (32; 0% instances), PART (17; 0% instances), PRON (10; 0% instances), SCONJ (5; 0% instances), AUX (1; 0% instances)