home edit page issue tracker

This page pertains to UD version 2.

UD Galician TreeGal

Language: Galician (code: gl)
Family: IE

This treebank has been part of Universal Dependencies since the UD v1.4 release.

The following people have contributed to making this treebank part of UD: Marcos Garcia, Xulia Sánchez-Rodríguez.

Repository: UD_Galician-TreeGal
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: LGPL-LR

Genre: news

Questions, comments? General annotation questions (either Galician-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [marcos • garcia • gonzalez (æt) usc • gal]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS annotated manually
Features annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Relations annotated manually, natively in UD style

Description

The Galician-TreeGal is a treebank for Galician developed at LyS Group (Universidade da Coruña) and at CiTIUS (Universidade de Santiago de Compostela).

The resource derives from a subset (called xeral) of the XIADA corpus (v2.6), created at the Centro Ramón Piñeiro para a Investigación en Humanidades (http://corpus.cirp.es/xiada/).

All the information except the syntactic one was semi-automatically converted to UD from the original resource. The dependency labels were assigned using cross-lingual parsing techniques, and then manually corrected by a linguist (see the references for more information). At the end of this process, several corrections were carried out in order to agree with the UD guidelines.

Galician-TreeGal v0.42 contains 1000 sentences of the xeral corpus (~25k tokens), and it is divided 60-40 splits (train-test).

Acknowledgments

Statistics of UD Galician TreeGal

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

AdpTypeCaseCliticDefiniteDegreeForeignGenderMoodNumberNumber[psor]NumTypePersonPolarityPossPronTypeTenseVerbForm

Relations

acladvcladvmodamodapposauxaux:passcaseccccompcompoundconjcopcsubjdepdetdiscourseexplfixedflat:foreignflat:nameiobjlistmarknmodnsubjnsubj:passnummodobjoblorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview