UD for Classical Chinese
Tokenization and Word Segmentation
There are neither spaces nor punctuations between words or sentences. Every word consists of a single character, except for several (proper) nouns.
Morphology
Tags
The predicate-object-final structure of very early Chinese texts had only three categories of words: predicate, object, and final. Here in our linguistic model we tentatively call them “verb” “noun” and “particle” respectively. Several words were specialised to be used as verbs, several as nouns, but most of them had been used in two or three categories around Zhou (周) dynasty.
At that era, we can observe very early modifier usages of verbs. Several verbs were specialised to be used as adverbial modifiers, afterwards caused adverbs. In between verbs and adverbs, auxiliary verbs were almost specialised to auxiliary uses, but incidentally used as verbs. Adjective usages of verbs were not specialised as adjectives at that era, on the other hand, some caused prepositions.
For POS-tagging of classical Chinese texts in UD, we use VERB
ADV
AUX
ADP
and SCONJ
to fill UPOS field of each verb-origin word, following the overview of modifier usages mentioned above. For noun-origin words we use NOUN
PROPN
PRON
NUM
and ADV
(noun-origin adverbs including 何), categorising them in rather nowadays point of view. For particle-origin words we use PART
CCONJ
and INTJ
, keeping up with the guideline of UD v2. We rarely use SYM
, and do not use ADJ
DET
PUNCT
or X
.
Features
-
NameType=Sur
Giv
Prs
Nat
Geo
forPROPN
. -
Case=Loc
Tem
orNounType=Class
forNOUN
. -
PronType=Prs
withPerson=1
2
3
orReflex=Yes
for personalPRON
. PronType=Dem
for demonstrativePRON
.PronType=Int
for interrogativePRON
.NumType=Ord
for zodiacNUM
.-
Polarity=Neg
orDegree=Pos
Equ
Sup
Cmp
forVERB
andADV
. -
AdvType=Deg
Tim
Cau
withAspect=Perf
orTense=Past
Pres
Fut
forADV
. -
Mood=Pot
Nec
Des
orVoice=Pass
forAUX
. VerbType=Cop
for copular use of verb (its UPOS is changed intoAUX
).VerbForm=Part
for adjective use ofVERB
.VerbForm=Conv
for adverbial use of verb (its UPOS is changed intoADV
).
Syntax
discourse:sp
to annotate the final sentence particles in the predicate-object-final structure.nsubj:pass
to annotate passive subjects.nsubj:outer
andcsubj:outer
to annotate subjects for predicate clauses.obl:tmod
to annotate temporal oblique nominals.obl:lmod
to annotate locational oblique nominals.compound:redup
(left-to-right) to annotate reduplicated compounds.flat:vv
(left-to-right) to annotate serial verbs (rather exocentric).flat:foreign
(left-to-right) to annotate foreign words.
Treebanks
There are two Classical Chinese UD treebanks:
- UD_Classical_Chinese-Kyoto (implemented in UD-Kanbun and SuPar-Kanbun)
- UD_Classical_Chinese-TueCL
References
- Koichi Yasuoka: Universal Dependencies Treebank of the Four Books in Classical Chinese, DADH2019: 10th International Conference of Digital Archives and Digital Humanities (December 2019), pp.20-28.