home edit page issue tracker

This page pertains to UD version 2.

UD Gujarati GujTB

Language: Gujarati (code: gu)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.14 release.

The following people have contributed to making this treebank part of UD: Maitrey Mehta, Mayank Jobanputra.

Repository: UD_Gujarati-GujTB
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-SA 4.0

Genre: grammar-examples

Questions, comments? General annotation questions (either Gujarati-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [maitrey (æt) cs • utah • edu]. Development of the treebank happens directly in the UD repository, so you may submit bug fixes as pull requests against the dev branch.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

GujTB is an in-progress treebank of Gujarati (an Indo-Aryan language) in Gujarati script.

Currently the treebank is comprised of 187 sentences, out of which 100 are doubly annotated by the authors. We plan to update the treebank with proper morphological annotations and features in the upcoming release.

Acknowledgments

References

Please cite the following paper if you use this treebank in your research:

@inproceedings{jobanputra-etal-2024-universal,
title = "A {U}niversal {D}ependencies Treebank for {G}ujarati",
author = {Jobanputra, Mayank and
Mehta, Maitrey and
{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i}},
editor = {Bhatia, Archna and
Bouma, Gosse and
Do{\u{g}}ru{\"o}z, A. Seza and
Evang, Kilian and
Garcia, Marcos and
Giouli, Voula and
Han, Lifeng and
Nivre, Joakim and
Rademaker, Alexandre},
booktitle = "Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.mwe-1.9",
pages = "56--62",
abstract = "The Universal Dependencies (UD) project has presented itself as a valuable platform to develop various resources for the languages of the world. We present and release a sample treebank for the Indo-Aryan language of Gujarati {--} a widely spoken language with little linguistic resources. This treebank is the first labeled dataset for dependency parsing in the language and the script (the Gujarati script). The treebank contains 187 part-of-speech and dependency annotated sentences from diverse genres. We discuss various idiosyncratic examples, annotation choices and present an elaborate corpus along with agreement statistics. We see this work as a valuable resource and a stepping stone for research in Gujarati Computational Linguistics.",
}

Statistics of UD Gujarati GujTB

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

CaseClusivityGenderMoodNumberPoliteTypoVerbType

Relations

aclacl:relcladvcladvcl:relcladvmodamodapposauxcasecccc:preconjccompcompoundcompound:lvccompound:svcconjcopdepdetdiscoursedislocatedfixedflatgoeswithiobjmarknmodnmod:possnmod:tmodnsubjnsubj:passnummodobjoblobl:agentobl:tmodorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview