home edit page issue tracker

This page pertains to UD version 2.

UD Italian ParTUT

Language: Italian (code: it)
Family: Indo-European, Romance

This treebank has been part of Universal Dependencies since the UD v2.0 release.

The following people have contributed to making this treebank part of UD: Cristina Bosco, Manuela Sanguinetti.

Repository: UD_Italian-ParTUT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.13

License: CC BY-NC-SA 4.0

Genre: legal, news, wiki

Questions, comments? General annotation questions (either Italian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [msanguin (æt) di • unito • it]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
UPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
XPOS annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Features annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion
Relations annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion

Description

UD_Italian-ParTUT is a conversion of a multilingual parallel treebank developed at the University of Turin, and consisting of a variety of text genres, including talks, legal texts and Wikipedia articles, among others.

UD_Italian-ParTUT data is derived from the already-existing parallel treebank Par(allel)TUT.

ParTUT is a morpho-syntactically annotated collection of Italian/French/English parallel sentences, which includes texts from different sources and representing different genres and domains, released in several formats.

ParTUT comprises approximately 167,000 tokens, with an average amount of 2,100 sentences per language. The texts of the collection currently available were gathered from a large number of sources and domains:

ParTUT data can be downloaded here and here.

NOTE: While the Italian section of ParTUT is already included in UD_Italian, UD_Italian-ParTUT comprises just those sentences having a 1:1 correspondence with their English and French counterparts.

Acknowledgments

We are deeply grateful to Project Syndicate© for letting us download and exploit their articles as text material, under the terms of educational use.

Statistics of UD Italian ParTUT

POS Tags

ADJADPADVAUXCCONJDETNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

CliticDefiniteDegreeForeignGenderMoodNumberNumTypePersonPossPronTypeReflexTenseVerbForm

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompcompoundconjcopcsubjcsubj:passdepdetdet:possdet:predetdiscourseexplexpl:impersexpl:passfixedflatflat:foreignflat:nameiobjmarknmodnsubjnsubj:passnummodobjoblobl:agentorphanparataxispunctrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Reflexive Passive

Verbs with Reflexive Core Objects

Relations Overview