home edit page issue tracker

This page pertains to UD version 2.

History of the UD Project

The UD annotation scheme is based on an evolution of (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The first attempt to combine Stanford dependencies and Google universal tags into a universal annotation scheme was the Universal Dependency Treebank (UDT) project (McDonald et al., 2013), which released treebanks for 6 languages in 2013 and 11 languages in 2014, and the first proposal for incorporating morphology was made by Tsarfaty (2013). The second version of HamleDT (Rosa et al., 2014) provided Stanford/Google annotation for 30 languages in 2014. This was followed by the development of universal Stanford dependencies (USD) (de Marneffe et al., 2014). The new Universal Dependencies is the result of merging all these initiatives into a single coherent framework, including designing a revised version of the CoNLL-X format (called CoNLL-U).

The first version of the UD guidelines was released in October 2014. Building beyond earlier efforts, UD POS categories have substantive definitions and are not just equivalence classes of categories in underlying language-particular treebanks, UD morphological features aim to provide a basic set of the features which are most crucial for analysis and are widespread across languages, and UD dependencies emphasizes grammatical relations common from many grammatical frameworks. That is, they are centrally organized around notions of subject, object, clausal complement, noun determiner, noun modifier, etc. The goal of the new universal version was to add or refine relations to better accommodate the grammatical structures of typologically different languages and to clean up some of the quirkier and more English-specific features of earlier proposals. The second version of the UD guidelines was released in December 2016. There have been gradual improvements to the guidelines in the 2020s. Publications have detailed the version 2 linguistic framework (de Marneffe et al., 2021) and surveyed the project’s data releases (Nivre et al., 2016, 2020).

See also:

Selected Publications

There are now hundreds of publications on various aspects and uses of UD, ranging from papers on the issues involved in constructing UD treebanks for particular languages to approaches to multilingual syntactic parsing to crosslinguistic psycholinguistic studies enabled by the common representation of UD. A few key references are listed below. Many other publications about UD can be found in the references of these papers; via the Universal Dependencies Workshops (proceedings on the ACL Anthology) and other events; or by searching for “Universal Dependencies” on Google Scholar.

Recent Overviews

Earlier Publications

2019

2018

2017

2016

2015

2014

2013 and before