home edit page issue tracker

This page pertains to UD version 2.

UD Portuguese DANTEStocks

Language: Portuguese (code: pt)
Family: IE

This treebank has been part of Universal Dependencies since the UD v2.15 release.

The following people have contributed to making this treebank part of UD: Ariani Di Felippo, Norton Trevisan Roman, Thiago Alexandre Salgueiro Pardo, Bryan Khelven da Silva Barbosa, Maria das Graças Volpe Nunes.

Repository: UD_Portuguese-DANTEStocks
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15

License: CC BY 4.0

Genre: social

Questions, comments? General annotation questions (either Portuguese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [ariani (æt) ufscar • br,norton (æt) usp • br,bryankhelven (æt) ieee • org]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.

Annotation Source
Lemmas annotated manually
UPOS annotated manually, natively in UD style
XPOS not available
Features annotated manually, natively in UD style
Relations annotated manually, natively in UD style

Description

DANTEStocks (Di Felippo et al., 2024) is a collection of Brazilian Portuguese tweets on the stock market domain that is part of Porttinari (“PORTuguese Treebank”), which shall be a large multigenre treebank for Portuguese (Pardo et al., 2021), following the “Universal Dependencies” framework (de Marneffe et al., 2021).

The corpus consists of 4,042 tweets and 80,997 tokens. To annotate the corpus according to UD, the entire tweet was taken as a basic unit of analysis, which means that the tweets were not segmented into smaller units as sentences, clauses or phrases. Besides, the tweets were not normalized, containing all phenomena typical to social media text in general and to Twitter in particular. Morphosyntactic (Silva et al., 2021; Di Felippo et al., 2023) and syntactic annotations (Di Felippo et al., 2024) were carried out through alternating steps of automatic processing and manual revision. For the interested reader, DANTEStocks, as well as other related information, may be accessed at Poetisa Project.

Acknowledgments

This work was carried out at the Center for Artificial Intelligence of the University of São Paulo (C4AI - c4ai.inova.usp.br), with support by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and by the IBM Corporation. The project was also supported by the Ministry of Science, Technology, and Innovation, with resources of Law N. 8.248, of October 23, 1991, within the scope of PPI-SOFTEX, coordinated by Softex and published as Residence in TIC 13, DOU 01245.010222/2022-44.

Statistics of UD Portuguese DANTEStocks

POS Tags

ADJADPADVAUXCCONJDETINTJNOUNNUMPRONPROPNPUNCTSCONJSYMVERBX

Features

AbbrCaseDefiniteForeignGenderMoodNumberNumTypePersonPossPronTypeTenseTypoVerbFormVoice

Relations

aclacl:relcladvcladvmodamodapposauxaux:passcaseccccompccomp:speechconjcopcsubjdepdetdiscoursedislocatedexplfixedflatflat:foreignflat:namegoeswithiobjlistmarknmodnmod:tmodnsubjnsubj:outernsubj:passnummodobjoblobl:agentorphanparataxispunctreparandumrootvocativexcomp

Tokenization and Word Segmentation

Morphology

Tags

Nominal Features

Degree and Polarity

Verbal Features

Pronouns, Determiners, Quantifiers

Other Features

Syntax

Auxiliary Verbs and Copula

Core Arguments, Oblique Arguments and Adjuncts

Here we consider only relations between verbs (parent) and nouns or pronouns (child).

Relations Overview