UD_Tagalog-TRG
|
UD_Tagalog-Ugnayan
|
Tokenization and Word Segmentation
|
Tokenization and Word Segmentation
|
- This corpus contains 128 sentences and 734 tokens.
|
- This corpus contains 94 sentences, 1011 tokens and 1097 syntactic words.
|
- This corpus contains 133 tokens (18%) that are not followed by a space.
|
- This corpus contains 154 tokens (15%) that are not followed by a space.
|
- This corpus does not contain words with spaces.
|
- This corpus does not contain words with spaces.
|
- This corpus does not contain words that contain both letters and punctuation.
|
- This corpus contains 9 types of words that contain both letters and punctuation. Examples: -ng, 't, 'y, Isa-isa, Kapaki-pakinabang, Paulit-ulit, nag-aalaga, nag-isip, nag-party
|
|
- This corpus contains 86 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
- There are 49 types of multi-word tokens. Examples: maraming, isang, niyang, ilang, ibang, kakaibang, koryenteng, akong, aleng, dalang, kaniyang, lawang, pang, pinakamahabang, siyang, taong, Anong, Hilagang, Mukhang, Nagmukhang, Puting, Sandaling, bansang, bolang, daang, ding, estudyanteng, inyong, itong, kabutihang, kaming, kasamang, kilalang, limang, mahabang, malaking, malalaking, mapigilang, mong, natitirang, ngayong, ninyong, nitong, pangunahing, payasong, pinakamalaking, probinsyang, sinabing, turistang.
|
Morphology
Tags
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, DET, INTJ, NOUN, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: NUM, CCONJ, SYM, X
|
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADJ, ADP, ADV, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, VERB
- This corpus does not use the following tags: AUX, SYM, X
|
- This corpus contains 7 word types tagged as particles (PART): ano, ba, bang, daw, hindi, kaya, sana
|
- This corpus contains 10 word types tagged as particles (PART): 'y, -ng, ay, ba, e, hindi, na, nang, ng, po
|
- This corpus contains 9 lemmas tagged as pronouns (PRON): ako, ikaw, ito, iyan, kayo, sarili, sila, sino, siya
|
- This corpus contains 13 lemmas tagged as pronouns (PRON): akin, ako, ano, ikaw, inyo, ito, kami, kayo, ko, lahat, sila, siya, tayo
|
- This corpus contains 3 lemmas tagged as determiners (DET): lahat, mga, mismo
|
- This corpus contains 4 lemmas tagged as determiners (DET): bawat, ilan, marami, mga
|
|
|
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): huwag
|
- This corpus contains 0 lemmas tagged as auxiliaries (AUX):
|
|
|
- This corpus does not use the VerbForm feature.
|
- This corpus does not use the VerbForm feature.
|
Nominal Features
|
Nominal Features
|
|
|
- Fem
- ADJ: Komika
- NOUN: Biyuda, maestra
- PROPN: Linda, Maria, Mary, Rosa
|
|
- Masc
- DET: mismo
- NOUN: Biyudo, maestro
- PROPN: Juan, Pedro, John, Bill
|
|
|
|
|
|
|
|
- Plur
- DET: mga
- PRON: nila, sila, kayong, natin
|
|
- Sing
- PRON: ka, ko, niya, siya, kaniyang, kong, siyang, ako, Iyan, ito
|
|
|
|
- Dat
- ADP: sa, kay
- PRON: kaniyang, kaniya
|
|
- Gen
- ADP: ng, ni
- PRON: ko, niya, kong, nila, mo, natin, niyang
|
|
- Nom
- ADP: ang, si
- PRON: ka, siya, siyang, ako, sila, Iyan, Sino, ito, kayong, kita
|
|
|
|
|
|
Degree and Polarity
|
Degree and Polarity
|
|
|
- Pos
- ADJ: Mabuti, Malapit, bago, Interesante, Maganda, Masagwa, Matalino, Matamis, Napakataas, Pagod
|
|
|
|
- Neg
- AUX: Huwag
- PART: hindi
- VERB: Wala, Walang
|
|
- Pos
- INTJ: Hindi, Oo
- VERB: Mayroon, May, Mayroong
|
|
|
|
Verbal Features
|
Verbal Features
|
|
|
- Hab
- VERB: Gusto, Kailangan, Ayaw
|
|
- Imp
- VERB: nagluluto, nagtatrabaho, Umuulan, nagiisa, pumapasok, sumasayaw, tumatanda, Binabasa, Bumabasa, Dumarating
|
|
- Perf
- VERB: nagluto, nakita, Inahit, Nagatubili, dumating, tumanggap, yumaman, Binalikan, Binigyan, Binili
|
|
- Prog
- VERB: darating, Aalisan, Aalisin, Ipagaalis, Magaalis, Susulat, Susulatin
|
|
|
|
|
|
- Imp
- AUX: Huwag
- VERB: Walisan, Bigyan, Magbigay
|
|
- Ind
- VERB: nagluluto, nagluto, nagtatrabaho, darating, nakita, Inahit, Nagatubili, Umuulan, dumating, nagiisa
|
|
|
|
|
|
|
|
- Act
- VERB: nagluluto, nagluto, nagtatrabaho, darating, Nagatubili, Umuulan, dumating, nagiisa, pumapasok, sumasayaw
|
|
|
|
|
|
- Lfoc
- VERB: Aalisan, Bigyan, Binalikan, Binigyan, Sinalpok, Tinakasan
|
|
- Pass
- VERB: Gusto, nakita, Inahit, Walisan, Aalisin, Ayaw, Binabasa, Binili, Binisita, Ginising
|
|
|
|
Pronouns, Determiners, Quantifiers
|
Pronouns, Determiners, Quantifiers
|
|
|
- Dem
- ADV: na, roon
- PRON: Iyan, ito
|
|
|
|
|
|
- Int
- ADJ: Napakaano
- ADV: nasaan, Bakit
- PRON: Sino
- VERB: Naano, Nagano
|
|
- Prs
- PRON: ka, ko, niya, siya, kaniyang, sarili, kong, nila, siyang, ako
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 1
- PRON: ko, kong, ako, kita, natin
|
|
|
|
- 3
- PRON: niya, siya, kaniyang, nila, siyang, sila, kaniya, niyang
|
|
|
|
|
|
|
|
Other Features
|
Other Features
|
|
|
|
|
|
|
- Link
- Yes
- ADJ: Bagong
- NOUN: batang, diyaryong, lalaking
- PART: bang
- PRON: kaniyang, kong, siyang, kayong, niyang, sariling
- PROPN: Juan
- VERB: Mayroong, Walang
|
|
- PartType
- Des
- Int
- PART: ba, ano, kaya, bang
- Nfh
|
|
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
|
Syntax
Auxiliary Verbs and Copula
- This corpus does not contain copulas.
|
- This corpus uses 1 lemmas as auxiliaries (aux). Examples: huwag.
|
- This corpus does not contain auxiliaries.
|
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (1)
- VERB--NOUN-ADP(ang) (30)
- VERB--PRON-ADP(ang) (1)
- VERB--PRON-Gen (3)
- VERB--PRON-Nom (14)
|
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (1)
- VERB--NOUN-ADP(ang) (24)
- VERB--PRON (25)
|
- obj
- VERB--NOUN (5)
- VERB--NOUN-ADP(ng) (21)
- VERB--PRON-ADP(ang) (1)
- VERB--PRON-Gen (1)
- VERB--PRON-Nom (1)
|
- obj
- VERB--NOUN (13)
- VERB--NOUN-ADP(ng) (17)
- VERB--PRON (19)
|
|
|
|
|
|
|
Verbs with Reflexive Core Objects
- This corpus contains 1 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: ahit sarili
|
|
Relations Overview
- This corpus uses 9 relation subtypes: acl:relcl, compound:redup, iobj:patient, nmod:poss, nsubj:bfoc, nsubj:ifoc, nsubj:lfoc, nsubj:pass, obj:agent
- The following 2 main types are not used alone, they are always subtyped: acl, compound
- The following 18 relation types are not used in this corpus at all: vocative, expl, dislocated, discourse, cop, appos, nummod, amod, clf, conj, cc, fixed, list, parataxis, orphan, goeswith, reparandum, dep
|
Relations Overview
- This corpus does not use relation subtypes.
- The following 13 relation types are not used in this corpus at all: iobj, csubj, expl, dislocated, aux, cop, appos, clf, list, orphan, goeswith, reparandum, dep
|