UD Welsh CCG
Language: Welsh (code: cy
)
Family: IE
This treebank has been part of Universal Dependencies since the UD v2.4 release.
The following people have contributed to making this treebank part of UD: Johannes Heinecke, Francis Tyers.
Repository: UD_Welsh-CCG
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: CC BY-SA 4.0
Genre: grammar-examples, wiki, nonfiction, fiction, news
Questions, comments? General annotation questions (either Welsh-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [johannes • heinecke (æt) orange • com]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD |
UPOS | annotated manually in non-UD style, automatically converted to UD |
XPOS | annotated manually |
Features | annotated manually in non-UD style, automatically converted to UD |
Relations | annotated manually, natively in UD style |
Description
UD Welsh-CCG (Corpws Cystrawennol y Gymraeg) is a treebank of Welsh, annotated according to the Universal Dependencies guidelines.
The main part of the annotated sentences come from the Welsh Wikipedia. Some sentences have been taken from the Corpus of the Welsh Assembly, from websites of Welsh speaking organisations (Cymdeithas yr Iaith Gymraeg, University of Wales), News (y Golwg, local Welsh language newspapers, BBC Cymru) and Welsh language blogs. A few example sentences are taken from Welsh Grammars (Gramaded Cymraeg Cyfoes: Gareth King, Modern Welsh).
Acknowledgments
If you use this treebank in your work, please cite:
@inproceedings{heinecke2019,
author = {Heinecke, Johannes and Tyers, Francis M.},
title = {{Development of a Universal Dependencies treebank for Welsh}},
year = {2019},
booktitle = {{Proceedings of the Celtic Language Technology Workshop}},
publisher = {European Association for Machine Translation},
address = {Dublin},
pages = {21--31},
url = {https://www.aclweb.org/anthology/W19-6904},
}
Statistics of UD Welsh CCG
POS Tags
ADJ – ADP – ADV – AUX – CCONJ – DET – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB
Features
Abbr – Degree – Foreign – Gender – Mood – Mutation – Number – NumForm – NumType – Person – Poss – PronType – Tense – VerbForm
Relations
acl – acl:relcl – advcl – advmod – amod – appos – aux – case – case:pred – cc – ccomp – compound – compound:redup – conj – cop – csubj – det – discourse – expl – fixed – flat – flat:name – iobj – mark – nmod – nmod:agent – nmod:poss – nmod:redup – nsubj – nummod – obj – obl – obl:agent – orphan – parataxis – punct – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 2617 sentences, 51461 tokens and 52308 syntactic words.
- This corpus contains 7193 tokens (14%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 136 types of words that contain both letters and punctuation. Examples: 'r, 'n, 'i, 'w, 'u, 'ch, 'na, ar-lein, 'm, 'di, 'ma, de-orllewin, pêl-droed, Covid-19, de-ddwyrain, e-bost, 'dyn, Glan-llyn, Is-Ganghellor, Lan-llyn, Morris-Jones, ail-agor, budd-daliadau, byd-eang, cyd-destunau, cyd-fynd, ddi-waith, en-suite, ga', gogledd-orllewin, hyd-ddi, ma', nghyd-destun, ry'n, 'S, 'cello, 'mynadd, 'nafu, 'th, Anne-Marie, Budd-dal, D-Day, Ddraenen-wen, Dw', Eingl-Sacsoniaid, Hanner-wir, Is-gangellor, Mhen-y-bont, Nglan-llyn, Ngwaelod-y-garth
- This corpus contains 834 multi-word tokens. On average, one multi-word token consists of 2.02 syntactic words.
- There are 161 types of multi-word tokens. Examples: roedd, rydym, does, iddo, rydw, dwi, rydyn, iddi, rwy, roeddwn, ati, doedd, ganddo, gennym, rydych, dyw, iddynt, amdano, dydw, iddyn, dydy, rwyf, dydych, ohonom, roedden, wrthi, ynddi, ynddo, amdani, ichi, arni, arno, arnom, rydan, Roeddet, Rwyt, arnaf, arnoch, dydi, dydyn, ganddi, ganddynt, gennych, inni, mono, ohoni, ohono, roeddech, ato, gennyf.
Morphology
Tags
- This corpus uses 15 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB
- This corpus does not use the following tags: INTJ, X
- This corpus contains 17 word types tagged as particles (PART): 'm, 'n, 'na, Ai, a, ddim, dim, fe, mi, na, nac, nad, ni, nid, y, yn, yr
- This corpus contains 22 lemmas tagged as pronouns (PRON): a, ai, chi, e, ef, hi, hon, hun, hwn, hwy, hyn, i, naill, neb, ni, pawb, peth, pwy, rhai, rhain, sawl, ti
- This corpus contains 4 lemmas tagged as determiners (DET): The, an, pa, y
- This corpus contains 7 lemmas tagged as auxiliaries (AUX): am, ar, bod, heb, newydd, wedi, yn
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: bod
- There are 3 (de)verbal forms:
- Fin
- AUX: mae, yw, oedd, oes, fydd, bydd, bu, fu, ydy, ydych
- VERB: mae, oedd, bydd, fydd, ydw, ydych, ydym, yw, ydyn, oes
- FinRel
- AUX: sy, sydd, ydw
- VERB: sy, sydd, allem, maent
- Vnoun
- AUX: bod, fod, mod
- NOUN: bod, cael, fod, gael, mynd, dod, wneud, ddod, fynd, gwneud
Nominal Features
- Fem
- ADJ: leol, fechan, ariannol, drydedd, werdd, Chernyweg, Genedlaethol, Gymraeg, Saesneg, Wen
- NOUN: iaith, Gymraeg, ysgol, Eisteddfod, rhan, llywodraeth, ystod, addysg, ardal, wythnos
- NOUN-Vnoun: Ramadeg, agor
- NUM: ddwy, tair, pedair, dwy, dair, thair, bedair, dyw, phedair
- PRON: hi, ei, hon, 'i, 'w, honno, 'u, hunain, Rhain, hithau
- PROPN: Cymru, Nghymru, Gymru, Wyddfa, Gwynedd, DU, Ffrainc, Siân, Nghaernarfon, Loegr
- Masc
- ADJ: unrhyw, bach, Ewropeaidd, arbennig, gyflym, blynyddol, brif, ddiweddar, eang, academaidd
- NOUN: ôl, nifer, gwaith, gyfer, mwyn, cyngor, rhaid, angen, byd, mis
- NOUN-Vnoun: Digwydd, esgyn, isio, sillafu, sôn, teledu
- NUM: ddau, dau, tri, dri, bedwar, pedwar, 4, 52, bymtheg, dair
- PRON: ei, e, 'i, hwn, o, 'w, fo, hwnnw, ef, fe
- PROPN: Eryri, Gwynedd, Môn, Bangor, UE, Dafydd, BBC, Dewi, Llanberis, Thomas
- Plur
- ADJ: eraill, arbenigol, Rhanbarthol, bychain, cryfion, gwledig, gwylltion, ifainc, llai, prysuraf
- AUX-Fin: ydych, ydyn, ydym, ydynt, byddwch, buoch, Byddech, Maen, Ydach, bydda
- NOUN: plant, ysgolion, blynyddoedd, aelodau, myfyrwyr, tai, disgyblion, gwasanaethau, llyfrau, blant
- PRON: eu, ni, chi, ein, nhw, hyn, eich, hynny, hwy, 'u
- PROPN: Blaenau, Cymry, Dolgellau, Pryderi, Wyn, Appalachians, Gaerdydd, Iseldiroedd, Pererinion, YesCymru
- VERB-Fin: ydych, ydym, ydyn, maen, gallwch, oedden, byddwch, maent, cewch, dewch
- VERB-FinRel: allem, maent
- Sing
- ADJ: unrhyw, bach, leol, barod, Ewropeaidd, arbennig, gyflym, Gymreig, blynyddol, brif
- AUX-Fin: mae, yw, oedd, oes, fydd, bydd, bu, fu, ydy, byddai
- AUX-FinRel: sy, sydd, ydw
- AUX-Vnoun: bod, fod, mod
- NOUN: bod, cael, ôl, iaith, fod, Gymraeg, gael, ysgol, mynd, nifer
- NOUN-Vnoun: bod, cael, fod, gael, mynd, dod, wneud, ddod, fynd, gwneud
- PRON: ei, i, hi, e, fy, ti, 'i, 'w, o, fi
- PROPN: Cymru, Bangor, Nghymru, Gymru, Gwynedd, Eryri, Wyddfa, Ffestiniog, Môn, DU
- VERB-Fin: mae, oedd, bydd, fydd, ydw, yw, oes, bu, cafodd, dw
- VERB-FinRel: sy, sydd
Degree and Polarity
- Cmp
- ADJ: mwy, fwy, well, uwch, bellach, nes, gwell, hŷn, ehangach, pellach
- Equ
- ADJ: ogystal, cystal, belled, rhated, gryfed, gynted, gystal
- Pos
- ADJ: Cymraeg, newydd, lleol, pob, arall, Gymraeg, mawr, holl, bob, prif
- Sup
- ADJ: mwyaf, nesaf, uchaf, cyntaf, diwethaf, fwyaf, gorau, olaf, gyntaf, gwaethaf
Verbal Features
- Cnd
- AUX-Fin: byddai, fyddai, Byddech, bysa
- VERB-Fin: byddai, dylai, fyddai, ddylai, allai, gallai, gellid, arferai, byddech, adnabyddid
- Imp
- VERB-Fin: peidiwch, dewch, Paid, cofiwch, ewch, Edrychwn, Rho, Ymunwch, cewch, cysylltwch
- Ind
- AUX-Fin: mae, yw, oedd, oes, fydd, bydd, bu, fu, ydy, ydych
- AUX-FinRel: sy, sydd, ydw
- VERB-Fin: mae, oedd, bydd, fydd, ydw, ydych, ydym, yw, ydyn, oes
- VERB-FinRel: sy, sydd, allem, maent
- Sub
- AUX-Fin: baech, fo
- VERB-Fin: gweler, fo, bai, bo'n, boed, bof, sylwer, ystyrier
- Fut
- AUX-Fin: fydd, bydd, byddwch, Byddaf, Byddan, Byddi, bydda, byddant
- VERB-Fin: bydd, fydd, gall, ceir, gallwch, byddwch, gellir, ddaw, cewch, byddaf
- VERB-FinRel: allem
- Imp
- AUX-Fin: oedd, byddai, oeddwn, Oeddet, fyddai, Byddech, baent, oeddech, oeddem, ydoedd
- VERB-Fin: oedd, byddai, oeddwn, dylai, meddai, fyddai, ddylai, oedden, oeddet, allai
- Past
- AUX-Fin: bu, fu, Buodd, Bûm, buoch, Byddant, fo
- VERB-Fin: bu, cafodd, fu, ddaeth, ddaru, aeth, daeth, sefydlwyd, dywedodd, wnaeth
- Pqp
- AUX-Fin: baswn, bawn
- VERB-Fin: baswn, Basai, Fasai, Gallasai, ddylase, ddylaswn, dylasai, dylaswn
- Pres
- AUX-Fin: mae, yw, oes, ydy, ydych, ydi, ydyn, dw, ydw, ydym
- AUX-FinRel: sy, sydd, ydw
- VERB-Fin: mae, ydw, ydych, ydym, yw, ydyn, oes, dw, maen, wy
- VERB-FinRel: sy, sydd, maent
Pronouns, Determiners, Quantifiers
- Dem
- PRON: hyn, hynny, hwn, hon, hwnnw, rhain, honno, rheiny, hwnna, hynna
- Emp
- PRON: ninnau, hithau, hwythau, innau, yntau
- Ind
- PRON: rhai, pawb, bawb, rai, phawb
- Int
- PRON: pwy, sawl, Hwn, ai, Beth, naill
- Prs
- PRON: ei, i, eu, hi, ni, chi, e, ein, nhw, fy
- Rcp
- PRON: hun, hunain, hunan
- Rel
- PRON: a
- Card
- NUM: un, chwe, 4, dau, ddau, ddwy, tri, tair, pedair, 2019
- Ord
- ADJ: cyntaf, ail, gyntaf, trydydd, drydedd, 19, cynta, gynnar, 17, 18
- Yes
- PRON: ei, ein, eu, fy, eich, 'i, 'n, 'u, 'w, 'ch
- 0
- VERB-Fin: sefydlwyd, ceir, gellir, cafwyd, rhoddwyd, agorwyd, welir, cynhelir, gweler, cynhaliwyd
- 1
- AUX-Fin: dw, oeddwn, ydw, ydym, ydyn, wy, Byddaf, wyf, Bûm, dwi
- AUX-FinRel: ydw
- PRON: i, ni, ein, fy, fi, mi, 'n, 'm, ninnau, '
- VERB-Fin: ydw, ydym, dw, oeddwn, ydyn, wy, wyf, dwi, wnes, byddaf
- VERB-FinRel: allem
- 2
- AUX-Fin: ydych, byddwch, Oeddet, Byddi, buoch, wyt, Byddech, Ydach, baech, oeddech
- PRON: chi, ti, eich, 'ch, chdi, dy, di, 'th, chwi, d'
- VERB-Fin: ydych, wyt, gallwch, byddwch, cewch, oeddet, dewch, oeddech, peidiwch, allwch
- 3
- AUX-Fin: mae, yw, oedd, oes, fydd, bydd, bu, fu, ydy, byddai
- AUX-FinRel: sy, sydd
- PRON: ei, eu, hi, e, nhw, 'i, 'w, hwy, o, 'u
- VERB-Fin: mae, oedd, bydd, fydd, yw, oes, bu, cafodd, fu, byddai
- VERB-FinRel: sy, sydd, maent
Other Features
- Abbr
- Yes
- ADV: ayb
- NOUN: g, b, AC, Dr, EFA, FS, MP3, Mr, yb, yp
- PROPN: BBC, UE, DU, E, J, R, T, UDA, A487, A55
- Yes
- Foreign
- Yes
- ADJ: Iron, pro-indy
- DET: na, The
- NOUN: Conradh, Croí, Féile, Gaeilge, Lady, Nation, caissons, emergency, irritating
- NOUN-Vnoun: irritating
- PROPN: Towers, n-Og, Bay, Electric, From, Horizon, Jubilee, New, Picnic, Tiger
- Yes
- Mutation
- AM
- ADJ: phob, chymdeithasol, phrif, Chernyweg, Chymraeg, Chymreig, chenedlaethol, pheryglus, phosib, chadarn
- ADP: thros, thua, chan, than, thrwy
- ADV: phryd, throsodd
- NOUN: phobl, chael, chynnal, chymorth, hannog, hiaith, phlant, chadw, chymunedau, hangen
- NOUN-Vnoun: chael, chynnal, chadw, harwain, chofio, hadeiladu, hannog, hystyried, chasglu, chreu
- NUM: hugain, thair, chan, phedair, thri
- PRON: e, phawb
- PROPN: Chynllaith, Phenfro, Phwllheli, Chaerwys, Chymru, Threwen
- SCONJ: phan
- VERB-Fin: chafodd, chaiff, cheir, Chadwa, Chlywais, Chlywodd, Chrediff, Chymrodd, Phrynan, chafwyd
- NM
- ADJ: mhob
- AUX-Vnoun: mod
- DET: mha
- NOUN: mlynedd, mod, mae, Mhrifysgol, nghanolfan, nhŷ, mro, ne, nghanol, nghyfnod
- NOUN-Vnoun: mod, ngeni, nal, ngalw, nghefnogi, ngorfodi, nharo, nhynnu
- PROPN: Nghymru, Mangor, Nghaerdydd, Ngwynedd, Mlaenau, Nghaernarfon, Mhatagonia, Nghaerfyrddin, Nhregaron, Mangladesh
- VERB-Fin: mreuddwyd
- SM
- ADJ: Gymraeg, bob, genedlaethol, fawr, dda, ogystal, wahanol, fwy, leol, bwysig
- ADP: dan, fewn, ledled, drwy
- ADV: bynnag, ddoe, draw, gyntaf, ddigon, gynt, drachefn, drannoeth, drennydd, drosodd
- AUX-Fin: fydd, fu, fyddai, fo
- AUX-Vnoun: fod
- DET: ba
- NOUN: fod, Gymraeg, gael, gyfer, wneud, ddod, fynd, bobl, beth, weld
- NOUN-Vnoun: fod, gael, wneud, ddod, fynd, weld, roi, greu, ddefnyddio, gynnwys
- NUM: ddau, ddwy, bymtheg, dair, dri, bedair, bum, bump, filiwn, bedwar
- PART: ddim
- PRON: bawb, rai, Beth, chi, e, hwy, i, ni
- PROPN: Gymru, Gaerdydd, DU, Fangor, Fôn, Gaerfyrddin, Brydain, Gaernarfon, Drawsfynydd, Feirionydd
- VERB-Fin: fydd, fu, ddaeth, ddaru, ddylai, wnaeth, gafodd, ddaw, wnes, allai
- VERB-FinRel: allem
- AM
- NumForm
- Combi
- NUM: 16.5m, 17.8, 1974-1996, 2021-22, 25.8, 3.3, 3.30, 4-8, 5.9m, 5m
- Digit
- NUM: 4, 2019, 10, 200, 2020, 50, 500, 7, 100, 11
- Roman
- NUM: I, VIII
- Word
- NUM: un, chwe, dau, ddau, ddwy, tri, tair, pedair, dwy, bymtheg
- Combi
Syntax
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: bod.
- This corpus uses 7 lemmas as auxiliaries (aux). Examples: yn, wedi, ar, am, newydd, heb, bod.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB-Fin--NOUN (899)
- VERB-Fin--NOUN-ADP(dros) (1)
- VERB-Fin--NOUN-ADP(gan) (1)
- VERB-Fin--NOUN-ADP(i) (3)
- VERB-Fin--NOUN-ADP(o) (3)
- VERB-Fin--PRON (663)
- VERB-Fin--PRON-ADP(gan) (2)
- VERB-FinRel--NOUN (8)
- obj
- VERB-Fin--NOUN (333)
- VERB-Fin--NOUN-ADP(yn) (1)
- VERB-Fin--PRON (100)
- VERB-FinRel--NOUN (1)
- VERB-FinRel--PRON (1)
Relations Overview
- This corpus uses 8 relation subtypes: acl:relcl, case:pred, compound:redup, flat:name, nmod:agent, nmod:poss, nmod:redup, obl:agent
- The following 6 relation types are not used in this corpus at all: dislocated, clf, list, goeswith, reparandum, dep