home edit page issue tracker

This page pertains to UD version 2.

It appears that you have Javascript disabled. Please consider enabling Javascript for this page to see the visualizations.

UD Italian VIT

Language: Italian (code: it)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.4 release.

The following people have contributed to making this treebank part of UD: Fabio Tamburini, Maria Simi, Cristina Bosco.

Repository: UD_Italian-VIT
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY-NC-SA 3.0

Genre: nonfiction, news

Questions, comments? General annotation questions (either Italian-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [simi (æt) di • unipi • it]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation	Source
Lemmas	annotated manually in non-UD style, automatically converted to UD
UPOS	annotated manually in non-UD style, automatically converted to UD
XPOS	annotated manually
Features	annotated manually in non-UD style, automatically converted to UD
Relations	annotated manually in non-UD style, automatically converted to UD

Description

The UD_Italian-VIT corpus was obtained by conversion from VIT (Venice Italian Treebank), developed at the Laboratory of Computational Linguistics of the Università Ca’ Foscari in Venice (Delmonte et al. 2007; Delmonte 2009; http://rondelmo.it/resource/VIT/Browser-VIT/index.htm).

The VIT is the effort of the collaboration of people working at the Laboratory of Computational Linguistics (LCL) of the University of Venice in the years 1995-2005. It is partly the result of annotation carried out and partly related to the development of a lexicon, a morphological analyzer, a tagger, a deep parser of Italian. All these resources were finally ready at the beginning of the 90s when the LCL got involved in the first Italian national projects.

VIT originated as a constituency based treebank following the theoretical framework described in (Delmonte et al. 2007), and was later converted into a dependency representation in ConLL-X format (Delmonte 2009). The annotation follows general X-bar criteria with 29 constituency labels and 102 PoS tags. VIT is also made available in a broad annotation version with 10 constituency labels and 22 PoS tags for machine learning purposes. The format is plain text with square bracketing and a UPenn style version which is readable by the open source query language has been also provided. The VIT contains about 272,000 words distributed over six different domains, and this is what makes it so relevant for the study of the structure of Italian language. VIT includes linguistic materials of diverse nature, extracted from five different text genres: news (170,000 words), burocratic (20,000 words), political (40,000 words), Economic and financial (12,000 words), scientific (20,000 words) and literary (10,000 words) genres (Delmonte et al. 2007). In addition, some 60,000 tokens of spoken dialogues in different Italian varieties were annotated.

Similarly to what we did for other Italian treebanks, the UD version of the VIT treebank was obtained by first converting to an unriched version of the MIDT (Merged Italian Dependency Treebank) scheme (Bosco, Montemagni, Simi 2012). Then a further conversion step from MIDT+ to UDv2 was performed. Conversion was followed by a series of semi-automatic harmonization steps, in order to compensate for several differences in the use of the target annotation scheme with respect to the other Italian treebanks. The splitting into training, devel and test was done maintaining as much as possible the original sequence and respecting the proportions indicated in the guidelines (80%, 10%, 10%).

Acknowledgments

We are indebted to Rodolfo Del Monte and his collaborators, Antonella Bristot and Sara Tonelli, for the initial work on the VIT treebank; we also acknowledge the contribution of Linda Alfieri and Elzara Khaialieva to the implementation of the conversion process from VIT to MIDT+, which consisted in setting up the automatic conversion rules and in checking the treebank manually.

References

Alfieri L., Tamburini F. (2016). (Almost) Automatic Conversion of the Venice Italian Treebank into the Merged Italian Dependency Treebank Format. In Proc. of the Third Italian Conference on Computational Linguistics - CLiC-IT 2016, Napoli, 5-6 December 2016, 19-23.
Bosco C., Montemagni S., Simi, M. (2012) Harmonization and Merging of two Italian Dependency Treebanks, Workshop on Merging of Language Resources, in Proceedings of LREC 2012, Workshop on Language Resource Merging, Instanbul, May 2012, ELRA, pp. 23-30.
Rodolfo Delmonte, Antonella Bristot, and Sara Tonelli (2007). VIT - Venice Italian Treebank: Syntactic and Quantitative Features. In Proc. Sixth International Workshop on Treebanks and Linguistic Theories.
Delmonte, R. (2009). Treebanking in VIT: from Phrase Structure to Dependency Representation. In Sergei Nirenburg (ed.) Language Engineering for Lesser-Studied Languages, pages 51–81. IOS Press, Amsterdam, The Netherlands.
Tamburini F. (2017). Semgrex-Plus: a Tool for Automatic Dependency-Graph Rewriting In Proc. of the Fourth International Conference on Dependency Linguistics - Depling 2017, Pisa, 18-20 September 2017, 248-254.

Statistics of UD Italian VIT

POS Tags

ADJ – ADP – ADV – AUX – CCONJ – DET – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB – X

Features

Clitic – Definite – Degree – Foreign – Gender – Mood – Number – NumType – Person – Polarity – Poss – PronType – Tense – VerbForm

Relations

acl – acl:relcl – advcl – advmod – amod – appos – aux – aux:pass – case – cc – ccomp – compound – conj – cop – csubj – dep – det – det:poss – det:predet – discourse – dislocated – expl – expl:impers – expl:pass – fixed – flat – flat:foreign – flat:name – iobj – list – mark – nmod – nsubj – nsubj:pass – nummod – obj – obl – obl:agent – orphan – parataxis – punct – root – vocative – xcomp

Tokenization and Word Segmentation

This corpus contains 10087 sentences, 259625 tokens and 280153 syntactic words.

This corpus contains 37134 tokens (14%) that are not followed by a space.

This corpus does not contain words with spaces.

This corpus contains 93 types of words that contain both letters and punctuation. Examples: l', d', n., un', art., c', quest', s', vent', L., anch', quell', trent', tutt', all', com', /ter, c-c, n', cos', dell', g-1, quarant', 14/a, baby-sitter, centro-sinistra, dev', g/1, joint-venture, po', senz', 1990-EQU-100, Banfield-Tripcovich, Bèghin-Say, Lehnigk-Emden, Sant', bloc-notes, h-1, mezz', nient', null', qual', sessant', 's, /bis, 108/a, 12-mo, 38-ma, 5/h9/051, 500-EQU-250

This corpus contains 20518 multi-word tokens. On average, one multi-word token consists of 2.00 syntactic words.
There are 574 types of multi-word tokens. Examples: del, della, al, dei, dell', delle, nel, alla, all', nella, ai, dal, degli, dalla, alle, sul, dall', nell', sulla, nei, nelle, agli, dello, sui, dalle, dai, negli, sulle, sull', dagli, allo, nello, sugli, dallo, col, sullo, farsi, essersi, farlo, misurarsi, coi, darsi, trovarsi, disporsi, impegnarsi, porsi, averla, battersi, confrontarsi, diffondersi.

Morphology

Nominal Features

Gender

Fem
- ADJ: altre, nuova, italiana, altra, nuove, politica, stessa, pubblica, economica, politiche
- ADJ-Part: illegittima, morta
- ADV: estremamente, inizialmente, costantemente, normalmente, celermente, contrariamente, lungamente, solamente, una, Molte
- AUX-Part: stata, state, dovuta, fatta, voluta
- CCONJ: essa
- DET: la, le, una, un', sua, questa, tutte, queste, sue, quella
- NOUN: società, attività, parte, legge, titolarità, provincia, città, sede, domanda, gestione
- NUM: un', terza, una, mezza
- PRON: quella, la, quelle, le, una, essa, questa, queste, altra, esse
- PRON-Part: adattate
- PUNCT: le
- VERB: prevista, indicate, presentata, comprese, effettuata, fatta, richiesta, data, previste, richieste
- VERB-Part: prevista, indicate, presentata, comprese, effettuata, fatta, richiesta, data, previste, richieste
- X: area, deregulation, mountain

Masc
- ADJ: altri, nuovo, economico, stesso, nuovi, scorso, altro, finanziario, ultimo, italiano
- ADJ-Part: abilitati, sommato
- ADP: dietro, per, ne, niente, rispetto
- ADV: volta, molto, poco, fa, lungo, troppo, no, seguito, casual, dietro
- AUX: stato, stati, potuto, dovuto, voluto, fatto, dovuti, essere, voluti
- AUX-Part: stato, stati, potuto, dovuto, voluto, fatto, dovuti, voluti
- CCONJ: altro, caso, quanto
- DET: il, i, un, gli, lo, questo, suo, tutti, questi, uno
- INTJ: ok
- NOUN: anni, miliardi, anno, posti, presidente, punto, governo, stato, gruppo, lavoro
- NUM: miliardi, milioni, un, primi, terzi, bis, rientro, uno
- PRON: lo, quello, quale, quelli, quanto, questo, tutti, gli, li, lui
- SCONJ: addebitati
- VERB: fatto, detto, approvato, previsto, avuto, previsti, deciso, ottenuto, visto, dato
- VERB-Part: fatto, detto, approvato, previsto, avuto, previsti, deciso, ottenuto, visto, chiesto
- X: local, network, personal, show, word-processing

Number

Plur
- ADJ: grandi, altri, sociali, altre, disponibili, nuovi, seguenti, titolari, nuove, internazionali
- ADJ-Part: abilitati
- ADP: quali, ne, per
- ADV: Molte, inesigibili, infine, soli, altri, prese, prossimi, semi, volte
- AUX: hanno, sono, stati, possono, devono, saranno, erano, siano, abbiamo, siamo
- AUX-Fin: hanno, sono, possono, devono, saranno, erano, siano, abbiamo, siamo, vengono
- AUX-Part: stati, state, dovuti, voluti
- CCONJ-Fin: pesino
- DET: i, le, gli, loro, tutti, questi, tutte, suoi, tali, queste
- NOUN: anni, miliardi, insegnanti, posti, trasferimenti, docenti, servizi, giorni, milioni, lire
- NOUN-Part: controllanti
- NUM: miliardi, milioni, primi, terzi
- PRON: c', quelli, quali, ci, quelle, tutti, li, noi, loro, essi
- PRON-Part: adattate
- PUNCT: le
- SCONJ: addebitati
- VERB: hanno, derivanti, sono, previsti, fanno, provenienti, effettuati, disposti, aventi, compresi
- VERB-Fin: hanno, sono, fanno, costituiscono, vanno, dicono, restano, abbiamo, rappresentano, considerano
- VERB-Part: previsti, effettuati, disposti, compresi, indicati, assegnati, iscritti, indicate, trasferiti, comprese

Sing
- ADJ: precedente, grande, presente, netto, generale, nazionale, sociale, possibile, finanziaria, civile
- ADJ-Part: illegittima, morta, sommato
- ADP: stante, Per, niente, rispetto
- ADV: pò, molto, poco, troppo, generale, ogni, nulla, quanto, seguito, una
- AUX-Fin: è, ha, sono, era, sarà, deve, può, sia, aveva, ho
- AUX-Part: stato, stata, potuto, dovuto, voluto, fatto, dovuta, fatta, voluta
- CCONJ: altro, caso, essa, quanto
- DET: il, la, l', un, una, lo, questo, un', sua, suo
- NOUN: anno, parte, legge, presidente, governo, stato, gruppo, provincia, lavoro, trasferimento
- NOUN-Fin: dice, vedo
- NOUN-Part: redigente, cauzionante
- NUM: un', terza, un, una, mezza, rientro, uno
- PRON: lo, quello, mi, quella, quale, la, quanto, questo, l', io
- SCONJ: come, cosa, quando
- VERB: è, ha, fatto, fa, dice, detto, approvato, scade, previsto, sembra
- VERB-Fin: è, ha, fa, dice, scade, sembra, va, tratta, prevede, spiega
- VERB-Part: fatto, detto, approvato, previsto, avuto, deciso, ottenuto, visto, chiesto, chiuso
- X: area, deregulation, local

Definite

Def
- DET: il, la, l', i, le, gli, lo, un, the

Ind
- DET: un, una, un', uno, delle
- NUM: uno
- PRON: altro, Tutti, altri, ognuna, qualcosa, una

Degree and Polarity

Degree

Abs
- ADJ: altissimo, altissima, gravissima, lunghissimo, bellissima, biondissima, brevissimo, difficilissima, durissimo, gravissimi
- ADV: benissimo, moltissimo, pochissimo
- DET: moltissime, moltissimi

Cmp
- ADJ: maggiore, superiore, maggior, maggiori, inferiore, minori, superiori, migliore, minore, migliori

Polarity

Neg
- ADV: no
- INTJ: no
- NOUN: no

Pos
- ADV: sì
- INTJ: sì

Verbal Features

Mood

Cnd
- AUX-Fin: potrebbe, sarebbe, dovrebbe, avrebbe, potrebbero, sarebbero, dovrebbero, avrebbero, avrei, vorrebbe
- VERB-Fin: sarebbe, avrebbe, andrebbe, farebbe, deriverebbe, direi, significherebbe, verrebbe, andresti, consentirebbe

Imp
- VERB-Fin: Cessate, leggi, Ascolta, Finiamola, Inviate, Lasciatemi, Mandateci, Rassegnamo, Riparliamo, Ripetiamo

Ind
- AUX-Fin: è, ha, sono, hanno, era, sarà, deve, può, sia, aveva
- CCONJ-Fin: pesino
- NOUN-Fin: dice, vedo
- VERB-Fin: è, ha, fa, hanno, dice, sono, scade, sembra, va, tratta

Sub
- AUX-Fin: fosse, abbia, abbiano, fossero, avesse, dovesse, dovessero, potesse, avessero, sia
- VERB-Fin: abbiano, abbia, fosse, sappiano, aprisse, avessi, avessimo, fossero, mancasse, ponesse

Tense

Fut
- AUX-Fin: sarà, saranno, dovrà, potranno, potrà, verrà, verranno, dovranno, potrò, avranno
- VERB-Fin: avrà, saranno, farà, andrà, darà, sarà, partirà, servirà, vedremo, avverrà

Imp
- AUX-Fin: era, aveva, erano, fosse, avevano, poteva, avevo, fossero, doveva, avesse
- VERB-Fin: era, aveva, sembrava, faceva, andava, sapeva, stava, avevano, erano, avevo

Past
- ADJ-Part: abilitati, illegittima, morta, sommato
- AUX-Fin: fu, venne, furono, vennero, dovette, dovettero, ebbi, fece, fui, potè
- AUX-Part: stato, stata, stati, state, potuto, dovuto, voluto, fatto, dovuta, fatta
- PRON-Part: adattate
- VERB-Fin: disse, ebbe, fece, chiese, cominciò, divenne, prese, rispose, portò, rimase
- VERB-Part: fatto, detto, approvato, previsto, avuto, previsti, deciso, ottenuto, visto, chiesto

Pres
- AUX-Fin: è, ha, sono, hanno, deve, può, sia, possono, devono, ho
- CCONJ-Fin: pesino
- NOUN-Fin: dice, vedo
- NOUN-Part: redigente, controllanti, cauzionante
- VERB-Fin: è, ha, fa, hanno, dice, sono, scade, sembra, va, tratta
- VERB-Part: spettante, spettanti, crescenti, caratterizzante, paralizzanti, assordanti, aventi, coabitante, crescente, delegittimanti

Pronouns, Determiners, Quantifiers

PronType

Art
- DET: il, la, l', i, le, un, gli, una, lo, un'
- NUM: uno
- PRON: altro, Tutti, altri, ognuna, qualcosa, una

Dem
- ADJ: altro, dato, tali
- DET: questo, questa, questi, tale, tali, queste, quest', quella, quel, quei
- PRON: quello, ciò, quella, quelli, questo, quelle, questi, questa, queste, coloro

Exc
- DET: che, quanta, quante

Ind
- ADJ: altro, altra, mezzo, troppi
- ADV: meno
- DET: tutti, ogni, tutte, qualche, alcuni, più, tutto, alcune, tutta, pochi
- PRON: tutti, più, tutto, uno, nessuno, altro, una, altri, nulla, niente
- VERB: vale

Int
- ADV: Perché
- DET: che, quale, quali, qual, quante, quanto, quanti
- PRON: chi, perché, dove, quando, come, cosa, che, quanto, quale, qual

Neg
- ADV: non, mai, né, neppure, no, neanche, nemmeno, mica, Niente, certamente

Prs
- DET: sua, suo, loro, nostro, suoi, sue, mia, nostra, mio, propria
- PRON: si, ci, lo, c', ne, mi, la, l', vi, io
- PRON-Part: adattate
- PUNCT: :

Rel
- AUX-Part: stata
- DET: cui, Quanta, quanti
- PRON: che, cui, dove, chi, quale, quali, quanto, quando, quanti, ove
- SCONJ: che

Tot
- DET: tutti, tutto, molta, tutta

NumType

Card
- ADJ: prima
- NOUN: 6
- NUM: due, tre, cento, 15, 1, 5, 1973, 2, 20, 30

Ord
- ADJ: primo, seconda, prima, secondo, terzo, prime, primi, quarto, quinto, II

Range
- NUM: 1975/1983, 1984/85, 1981/83, 24/25, 0,7-0,8, 1964/73, 1964/74, 1974/83, 1975/84, 1983/84

Poss

Yes
- ADJ: stessa, stesso
- DET: sua, suo, loro, nostro, suoi, sue, mia, nostra, mio, propria
- PRON: tuo, sua, mio, essa, nostra, suo, suoi, che, loro, nostro
- PRON-Part: adattate

Person

1
- AUX-Fin: sono, sia, ho, abbiamo, siamo, possa, avevo, stiamo, dobbiamo, avrei
- NOUN-Fin: vedo
- PRON: c', mi, ci, io, noi, me, ce
- VERB-Fin: credo, abbiamo, so, veda, penso, sia, ho, vedremo, sento, avevo

2
- AUX-Fin: hai, state, sei, avete, stai, siete, volete, vorresti, volevi, vuoi
- PRON: ti, vi, voi, te, tu
- VERB-Fin: vai, mangi, hai, preferisci, andate, fai, vieni, pensi, andresti, metti

3
- AUX-Fin: è, ha, hanno, sono, era, sarà, deve, può, aveva, possono
- CCONJ-Fin: pesino
- NOUN-Fin: dice
- PRON: si, lo, la, l', gli, li, lui, le, loro, essi
- VERB-Fin: è, ha, fa, hanno, dice, sono, scade, sembra, va, tratta

Other Features

Clitic
- Yes
  - PRON: si, ci, lo, c', ne, mi, la, l', vi, gli
  - PUNCT: :

Foreign
- Yes
  - ADJ: scientific
  - ADP: of
  - NOUN: revolutions, structure
  - X: joint, venture, station, work, baby, cd, sitter, personal, computer, condicio

Syntax

Auxiliary Verbs and Copula

This corpus uses 1 lemmas as copulas (cop). Examples: essere.

This corpus uses 10 lemmas as auxiliaries (aux). Examples: avere, essere, potere, dovere, volere, stare, fare, andare, venire, sapere.
This corpus uses 9 lemmas as passive auxiliaries (aux:pass). Examples: essere, venire, andare, avere, dovere, potere, stare, fare, volere.

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

nsubj
- VERB--NOUN (4)
- VERB--PRON (5)
- VERB-Fin--NOUN (2816)
- VERB-Fin--PRON (1665)
- VERB-Ger--NOUN (42)
- VERB-Ger--PRON (32)
- VERB-Inf--NOUN (408)
- VERB-Inf--PRON (179)
- VERB-Part--NOUN (1238)
- VERB-Part--PRON (536)

obj
- VERB--NOUN (91)
- VERB--PRON (5)
- VERB-Fin--NOUN (2245)
- VERB-Fin--PRON (544)
- VERB-Ger--NOUN (370)
- VERB-Ger--PRON (44)
- VERB-Inf--NOUN (2215)
- VERB-Inf--PRON (263)
- VERB-Part--NOUN (1267)
- VERB-Part--PRON (335)

iobj
- VERB-Fin--PRON (199)
- VERB-Ger--PRON (21)
- VERB-Inf--NOUN (1)
- VERB-Inf--PRON (84)
- VERB-Part--PRON (114)

Reflexive Passive

This corpus contains 1 lemmas that occur at least once with an expl:pass child. Examples: determinare si

Relations Overview

This corpus uses 10 relation subtypes: acl:relcl, aux:pass, det:poss, det:predet, expl:impers, expl:pass, flat:foreign, flat:name, nsubj:pass, obl:agent
The following 3 relation types are not used in this corpus at all: clf, goeswith, reparandum