home edit page issue tracker

This page pertains to UD version 2.

UD Turkish German SAGT

Language: Turkish German (code: qtd)
Family: Code switching

This treebank has been part of Universal Dependencies since the UD v2.7 release.

The following people have contributed to making this treebank part of UD: Özlem Çetinoğlu, Çağrı Çöltekin.

Repository: UD_Turkish_German-SAGT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-NC-SA 4.0

Genre: spoken

Questions, comments? General annotation questions (either Turkish German-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ozlem (æt) ims • uni-stuttgart • de]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

UD Turkish-German SAGT is a Turkish-German code-switching treebank that is developed as part of the SAGT project.

The treebank consists of bilingual conversation transcriptions annotated with several layers: language IDs, lemmas, POS tags, morphological features, and dependency relations. Language IDs are described below. The rest of the annotations follow the Universal Dependencies annotation scheme, and the conventions used in monolingual Turkish and German treebanks.

There are 48 distinct conversations from 17 participants. The majority of the speakers are university students, hence the most frequent age range is 18–25. Common conversation themes include studies, work, travel, free time activities such as sports, books, TV, and future plans.

The accompanying audio recordings of transcriptions are also available as a speech corpus, with a separate licence. Please contact ozlem@ims.uni-stuttgart.de for further information.

Acknowledgments

The treebank development is funded by DFG via project CE 326/1-1 “Computational Structural Analysis of German-Turkish Code-Switching”. We thank Cansu Turgut, Reha Sakızlı, Semanur Ceylan, and Sevde Ceylan for data collection and annotation.

References

For the treebank and speech collection:

Çetinoğlu, Özlem and Çağrı Çöltekin (2022). “Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges”. In: Language Resources and Evaluation, pp. 1–35. issn: 1574-020X.

https://link.springer.com/content/pdf/10.1007/s10579-021-09573-1.pdf

@article{cetinoglu2022,
year = {2022},
title = {{Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges}},
author = {Çetinoğlu, Özlem and Çöltekin, Çağrı},
journal = {Language Resources and Evaluation},
issn = {1574-020X},
doi = {10.1007/s10579-021-09573-1},
pages = {1--35}
}

For the treebank:

https://www.aclweb.org/anthology/W19-7809.pdf

@inproceedings{cetinoglu2019,
title = "Challenges of Annotating a Code-Switching Treebank",
author = {{\c{C}}etino{\u{g}}lu, {\"O}zlem and
{\c{C}}{\"o}ltekin, {\c{C}}a{\u{g}}r{\i}},
booktitle = "Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)",
OPTmonth = aug,
year = "2019",
address = "Paris, France",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/W19-7809",
doi = "10.18653/v1/W19-7809",
pages = "82--90",
}

For language IDs:

http://www.lrec-conf.org/proceedings/lrec2016/pdf/1151_Paper.pdf

@InProceedings{cetinoglu2016,
author = {Özlem Çetinoğlu},
title = {A {Turkish}-German Code-Switching Corpus},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
year = {2016},
location = {Portorož, Slovenia},
pages = {23--28},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene
Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
address = {Paris, France},
isbn = {978-2-9517408-9-1},
}

Statistics of UD Turkish German SAGT

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBX

Features

AspectCaseDefiniteEvidentForeignGenderMoodNumberNumber[psor]NumTypePersonPerson[psor]PolarityPossPronTypeReflexTenseTypoVerbFormVoice

Relations

acladvcladvmodadvmod:emphamodapposappos:transauxaux:passaux:qcaseccccompcompoundcompound:lvccompound:prtcompound:redupconjcopcsubjdetdiscoursedislocatedexplexpl:pvfixedflatiobjmarknmodnsubjnsubj:outernsubj:passnummodobjoblorphanparataxisparataxis:discourseparataxis:transpunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Verbs

Verbs with Reflexive Core Objects

Relations Overview