home edit page issue tracker

This page pertains to UD version 2.

Introduction

Universal Dependencies (UD) is a project that is developing cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on an evolution of (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.

This is illustrated in the following parallel examples from English, Bulgarian, Czech and Swedish, where the main grammatical relations involving a passive verb, a nominal subject and an oblique agent are the same, but where the concrete grammatical realization varies.

# visual-style 4 2 nsubj:pass	color:blue
# visual-style 4 7 obl	color:blue
1	The	the	DET	_	Definite=Def|PronType=Art	2	det	_	_
2	dog	dog	NOUN	_	Gender=Neut|Number=Sing	4	nsubj:pass	_	_
3	was	be	AUX	_	Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin	4	aux:pass	_	_
4	chased	chase	VERB	_	Tense=Past|VerbForm=Part	0	ROOT	_	_
5	by	by	ADP	_	_	7	case	_	_
6	the	the	DET	_	Definite=Def|PronType=Art	7	det	_	_
7	cat	cat	NOUN	_	Gender=Neut|Number=Sing	4	obl	_	_
8	.	.	PUNCT	_	_	4	punct	_	_

# visual-style 3 1 nsubj:pass	color:blue
# visual-style 3 5 obl	color:blue
1	Кучето	куче	NOUN	_	Definite=Def|Gender=Neut|Number=Sing	3	nsubj:pass	_	_
2	се	се	PRON	_	Case=Acc|PronType=Prs|Reflex=Yes	3	expl:pass	_	_
3	преследваше	преследвам	VERB	_	Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	0	root	_	_
4	от	от	ADP	_	_	5	case	_	_
5	котката	котка	NOUN	_	Definite=Def|Gender=Fem|Number=Sing	3	obl	_	_
6	.	.	PUNCT	_	_	3	punct	_	_

# visual-style 3 1 nsubj:pass	color:blue
# visual-style 3 4 obl	color:blue
1	Pes	pes	NOUN	_	Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing	3	nsubj:pass	_	_
2	byl	být	AUX	_	Aspect=Imp|Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Act	3	aux:pass	_	_
3	honěn	honit	VERB	_	Aspect=Imp|Gender=Masc|Number=Sing|VerbForm=Part|Voice=Pass	0	root	_	_
4	kočkou	kočka	NOUN	_	Case=Ins|Gender=Fem|Number=Sing	3	obl	_	_
5	.	.	PUNCT	_	_	3	punct	_	_

# visual-style 2 1 nsubj:pass	color:blue
# visual-style 2 4 obl	color:blue
1	Hunden	hund	NOUN	_	Definite=Def	2	nsubj:pass	_	_
2	jagades	jaga	VERB	_	Tense=Past|Voice=Pass	0	root	_	_
3	av	av	ADP	_	_	4	case	_	_
4	katten	katt	NOUN	_	Definite=Def	2	obl	_	_
5	.	.	PUNCT	_	_	2	punct	_	_

What is needed for UD to be successful?

The secret to understanding the design and current success of UD is to realize that the design is a very subtle compromise between approximately 6 things:

  1. UD needs to be satisfactory on linguistic analysis grounds for individual languages.
  2. UD needs to be good for linguistic typology, i.e., providing a suitable basis for bringing out cross-linguistic parallelism across languages and language families.
  3. UD must be suitable for rapid, consistent annotation by a human annotator.
  4. UD must be easily comprehended and used by a non-linguist, whether a language learner or an engineer with prosaic needs for language processing. We refer to this as seeking a habitable design, and it leads us to favor traditional grammar notions and terminology.
  5. UD must be suitable for computer parsing with high accuracy.
  6. UD must support well downstream language understanding tasks (relation extraction, reading comprehension, machine translation, …).

It’s easy to come up with a proposal that improves UD on one of these dimensions. The interesting and difficult part is to improve UD while remaining sensitive to all these dimensions.

Project organization

UD is an open collaboration with many project members. The administrative structure is kept at a minimum and currently consists of the following:

List of contributors

History and publications