Treebank Statistics: UD_Czech-PDT: POS Tags: PROPN
There are 4725 PROPN
lemmas (17%), 6531 PROPN
types (12%) and 15741 PROPN
tokens (5%).
Out of 17 observed tags, the rank of PROPN
is: 3 in number of lemmas, 4 in number of types and 7 in number of tokens.
The 10 most frequent PROPN
lemmas: Praha, ČR, Německo, ODS, Evropa, LN, Jan, Jiří, Brno, Slovensko
The 10 most frequent PROPN
types: Praha, ČR, ODS, Praze, LN, USA, Jiří, Jan, OSN, Václav
The 10 most frequent ambiguous lemmas: Washington (PROPN 24, X 1), Fischer (PROPN 16, X 1), York (X 20, PROPN 16), Bohemia (PROPN 15, X 2), Brod (PROPN 9, X 1), Panton (PROPN 9, X 1), Benetton (PROPN 8, X 1), Inkatha (PROPN 8, X 1), Albert (PROPN 7, X 1), Ford (PROPN 7, X 1)
The 10 most frequent ambiguous types: Plzeň (PROPN 22, NOUN 2), Nováček (PROPN 15, NOUN 1), Maďarsko (PROPN 14, ADJ 1), Bohemia (PROPN 13, X 2), C (NOUN 23, PROPN 12), Fischer (PROPN 11, X 1), Plzni (PROPN 9, NOUN 1), Škoda (PROPN 9, NOUN 4), Albert (PROPN 7, X 1), Benetton (PROPN 6, X 1)
- Plzeň
- Nováček
- Maďarsko
- Bohemia
- C
- Fischer
- Plzni
- Škoda
- Albert
- Benetton
- PROPN 6: Podnikavý Benetton
- X 1: Nejnovější série šokujících fotografií , které se začnou objevovat s malým nápisem United Colours of Benetton , přinese například pohled na prázdné elektrické křeslo v newyorském vězení , albínskou černošskou holčičku mezi černými kamarádkami v Jižní Africe a muže zatýkaného agenty KGB .
Morphology
The form / lemma ratio of PROPN
is 1.382222 (the average of all parts of speech is 1.961704).
The 1st highest number of forms (8) was observed with the lemma “Američan”: Američan, Američana, Američanem, Američani, Američany, Američané, Američanů, Američanům.
The 2nd highest number of forms (8) was observed with the lemma “Čech”: Čech, Čecha, Čechem, Čechy, Čechů, Čechům, Češi, Češích.
The 3rd highest number of forms (7) was observed with the lemma “Kanada”: KANADA, Kan, Kanada, Kanadou, Kanadu, Kanady, Kanadě.
PROPN
occurs with 8 features: NameType (15741; 100% instances), Gender (14282; 91% instances), Case (13840; 88% instances), Number (13840; 88% instances), Animacy (9109; 58% instances), Abbr (1457; 9% instances), Typo (14; 0% instances), Style (12; 0% instances)
PROPN
occurs with 26 feature-value pairs: Abbr=Yes
, Animacy=Anim
, Animacy=Inan
, Case=Acc
, Case=Dat
, Case=Gen
, Case=Ins
, Case=Loc
, Case=Nom
, Case=Voc
, Gender=Fem
, Gender=Masc
, Gender=Neut
, NameType=Geo
, NameType=Geo,Giv
, NameType=Geo,Giv,Oth
, NameType=Geo,Oth
, NameType=Giv
, NameType=Giv,Nat
, NameType=Giv,Oth
, NameType=Nat
, NameType=Oth
, Number=Plur
, Number=Sing
, Style=Coll
, Typo=Yes
PROPN
occurs with 169 feature combinations.
The most frequent feature combination is Animacy=Anim|Case=Nom|Gender=Masc|NameType=Giv|Number=Sing
(4546 tokens).
Examples: Jiří, Jan, Václav, Vladimír, Klaus, Petr, Pavel, Josef, John, Havel
Relations
PROPN
nodes are attached to their parents using 22 different relations: nmod (4418; 28% instances), flat (4004; 25% instances), nsubj (2090; 13% instances), conj (1488; 9% instances), obl (1262; 8% instances), root (1041; 7% instances), dep (576; 4% instances), obl:arg (257; 2% instances), obj (211; 1% instances), appos (177; 1% instances), orphan (72; 0% instances), nsubj:pass (50; 0% instances), iobj (47; 0% instances), advcl (20; 0% instances), xcomp (8; 0% instances), ccomp (6; 0% instances), vocative (5; 0% instances), acl (3; 0% instances), acl:relcl (3; 0% instances), amod (1; 0% instances), csubj (1; 0% instances), parataxis (1; 0% instances)
Parents of PROPN
nodes belong to 15 different parts of speech: NOUN (6881; 44% instances), PROPN (3715; 24% instances), VERB (3393; 22% instances), (1041; 7% instances), ADJ (424; 3% instances), NUM (79; 1% instances), ADV (75; 0% instances), X (69; 0% instances), DET (31; 0% instances), PRON (15; 0% instances), AUX (8; 0% instances), PART (6; 0% instances), ADP (2; 0% instances), CCONJ (1; 0% instances), SYM (1; 0% instances)
7453 (47%) PROPN
nodes are leaves.
5061 (32%) PROPN
nodes have one child.
1813 (12%) PROPN
nodes have two children.
1414 (9%) PROPN
nodes have three or more children.
The highest child degree of a PROPN
node is 29.
Children of PROPN
nodes are attached using 28 different relations: case (3220; 22% instances), punct (3055; 21% instances), flat (1742; 12% instances), conj (1602; 11% instances), nmod (1289; 9% instances), amod (802; 6% instances), cc (720; 5% instances), dep (589; 4% instances), appos (249; 2% instances), advmod:emph (237; 2% instances), acl:relcl (232; 2% instances), nummod (179; 1% instances), xcomp (75; 1% instances), cop (61; 0% instances), mark (53; 0% instances), orphan (53; 0% instances), nsubj (52; 0% instances), obl (30; 0% instances), advmod (22; 0% instances), parataxis (21; 0% instances), nummod:gov (19; 0% instances), det (18; 0% instances), acl (7; 0% instances), advcl (3; 0% instances), det:numgov (2; 0% instances), ccomp (1; 0% instances), csubj (1; 0% instances), expl:pv (1; 0% instances)
Children of PROPN
nodes belong to 16 different parts of speech: PROPN (3715; 26% instances), ADP (3202; 22% instances), PUNCT (3055; 21% instances), NOUN (1459; 10% instances), ADJ (866; 6% instances), CCONJ (779; 5% instances), NUM (378; 3% instances), VERB (309; 2% instances), ADV (159; 1% instances), X (148; 1% instances), PART (89; 1% instances), AUX (62; 0% instances), SCONJ (53; 0% instances), DET (47; 0% instances), PRON (8; 0% instances), SYM (6; 0% instances)