home edit page issue tracker

This page pertains to UD version 2.

UD Irish TwittIrish

Language: Irish (code: ga)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.8 release.

The following people have contributed to making this treebank part of UD: Lauren Cassidy, Teresa Lynn, Jennifer Foster, Sarah McGuinness.

Repository: UD_Irish-TwittIrish
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: social

Questions, comments? General annotation questions (either Irish-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [lauren • cassidy (æt) adaptcentre • ie; teresa • lynn (æt) adaptcentre • ie]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features not available
Relations annotated manually, natively in UD style

Description

A Universal Dependencies treebank of 2596 tweets in modern Irish.

The TwittIrish treebank contains 2596 Irish language tweets from two corpora: 1297 tweets from the New Twitter Corpus [NTC] and 1299 tweets from the Lynn Twitter Corpus [LTC].

Irish language tweets were identified by Kevin Scannell as part of the Indigenous Tweets website project http://indigenoustweets.com/. Non-Irish tweets were filtered out using a simple character-trigram language identifier.

The conversion from the LTC annotation scheme to the UD annotation scheme was designed by Lauren Cassidy as part of an PhD project, supervised by Dr. Teresa Lynn and Dr. Jennifer Foster at Dublin City University, Ireland. The conversion was automatic, with manual review, in consultation with other researchers working on UD annotation of User Generated Content (Sanguinetti et al., 2020).

Trees were parsed automatically using the Irish UD Treebank [IUDT] (Lynn and Foster, 2016) as training data, followed by manual review. The IUDT can be found here https://github.com/UniversalDependencies/UD_Irish-IDT.

Acknowledgments

We wish to thank all of the contributors to the IUDT annotation, Kevin Scannell for providing data and linguistic advice, and James Barry for improving the accuracy of automatic parsing by experimenting with different models.

The creation of TwittIrish treebank from 2019-2023 is funded by the Irish Government The Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media under the GaelTech project.

This research is partially supported by Science Foundation Ireland through the ADAPT Centre for Digital Content Technology. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

References

Statistics of UD Irish TwittIrish

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

Relations

aclacl:relcladvcladvmodamodapposauxcasecase:vocccccompcompoundcompound:prtconjcopcsubjcsubj:cleftcsubj:copdepdetdet:possdiscoursediscourse:emoexplfixedflatflat:foreignflat:namegoeswithiobjlistmarkmark:prtnmodnmod:possnmod:tmodnsubjnsubj:outernummodobjoblobl:prepobl:tmodorphanparataxisparataxis:hashtagparataxis:rtparataxis:sentenceparataxis:urlpunctreparandumrootvocativevocative:mentionxcompxcomp:pred

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview