UD Classical Chinese Kyoto
Language: Classical Chinese (code: lzh
)
Family: Sino-Tibetan
This treebank has been part of Universal Dependencies since the UD v2.4 release.
The following people have contributed to making this treebank part of UD: Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Yuan Li, Hiroyuki Shirasu, Kazunori Fujita.
Repository: UD_Classical_Chinese-Kyoto
Search this treebank on-line: PML-TQ
Download all treebanks: UD 2.15
License: PD
Genre: nonfiction, poetry
Questions, comments? General annotation questions (either Classical Chinese-specific or cross-linguistic) can be raised in the main UD issue tracker. You can report bugs in this treebank in the treebank-specific issue tracker on Github. If you want to collaborate, please contact [yasuoka (æt) kanji • zinbun • kyoto-u • ac • jp]. Development of the treebank happens outside the UD repository. If there are bugs, either the original data source or the conversion procedure must be fixed. Do not submit pull requests against the UD repository.
Annotation | Source |
---|---|
Lemmas | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
UPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
XPOS | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Features | annotated manually in non-UD style, automatically converted to UD, with some manual corrections of the conversion |
Relations | annotated manually, natively in UD style |
Description
Classical Chinese Universal Dependencies Treebank annotated and converted by Institute for Research in Humanities, Kyoto University.
This Treebank is taken under the full text of 論語, 孟子, 禮記, 十八史略, 楚辭, 戰國策, and others. In Classical Chinese we had no spaces or punctuations between words or sentences, so we did not include any spaces or punctuations in Treebank files:
- lzh_kyoto-ud-test.conllu
- 學而篇第一 為政篇第二 and 八佾篇第三 from 論語
- 梁惠王上 and 梁惠王下 from 孟子
- 中庸 from 禮記
- 春秋戰國 from 十八史略
- 離騷 from 楚辭
- 摩訶般若波羅蜜大明呪經
-
東周 from 戰國策
- lzh_kyoto-ud-dev.conllu
- 顏淵篇第十二 子路篇第十三 and 憲問篇第十四 from 論語
- 告子上 and 告子下 from 孟子
- 大學 from 禮記
- 唐 from 十八史略
- 遠遊 from 楚辭
- 金剛般若波羅蜜經
-
西周 from 戰國策
- lzh_kyoto-ud-train.conllu
- 論語 (except for 學而篇第一 為政篇第二 八佾篇第三 顏淵篇第十二 子路篇第十三 憲問篇第十四)
- 孟子 (except for 梁惠王上 梁惠王下 告子上 告子下)
- 禮記 (except for 中庸 大學)
- 十八史略 (except for 春秋戰國 唐)
- 九歌 天問 九章 卜居 漁父 九辯 and 招魂 from 楚辭
- 唐詩三百首
- 佛說阿彌陀經
- 戰國策 (except for 東周 西周)
Acknowledgments
Statistics of UD Classical Chinese Kyoto
POS Tags
ADP – ADV – AUX – CCONJ – INTJ – NOUN – NUM – PART – PRON – PROPN – PUNCT – SCONJ – SYM – VERB
Features
AdvType – Aspect – Case – Degree – Mood – NameType – NounType – NumType – Person – Polarity – PronType – Reflex – Tense – VerbForm – VerbType – Voice
Relations
acl – advcl – advmod – amod – appos – aux – case – cc – ccomp – clf – compound – compound:redup – conj – cop – csubj – csubj:outer – csubj:pass – det – discourse – discourse:sp – dislocated – expl – fixed – flat – flat:foreign – flat:vv – iobj – list – mark – nmod – nsubj – nsubj:outer – nsubj:pass – nummod – obj – obl – obl:lmod – obl:tmod – orphan – parataxis – root – vocative – xcomp
Tokenization and Word Segmentation
- This corpus contains 86239 sentences and 433168 tokens.
- This corpus contains 428559 tokens (99%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
Morphology
Tags
- This corpus uses 14 UPOS tags out of 17 possible: ADP, ADV, AUX, CCONJ, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB
- This corpus does not use the following tags: ADJ, DET, X
- This corpus contains 45 word types tagged as particles (PART): 不, 乎, 也, 于, 云, 些, 以, 來, 侯, 兮, 其, 只, 哉, 夫, 如, 子, 寧, 居, 已, 思, 所, 攸, 斯, 歟, 止, 焉, 然, 爾, 甫, 疇, 矣, 者, 而, 耳, 耶, 聿, 與, 若, 蓋, 記, 諸, 載, 逝, 邪, 馨
- This corpus contains 53 lemmas tagged as pronouns (PRON): 乃, 之, 予, 云, 他, 伊, 何, 佗, 余, 僕, 公, 其, 卬, 厥, 吾, 夫, 奚, 女, 子, 孤, 孰, 它, 安, 害, 己, 彼, 惟, 我, 或, 按, 斯, 是, 時, 曷, 朕, 某, 此, 汝, 焉, 爰, 爾, 瑕, 甚, 維, 而, 自, 若, 茲, 言, 許, 誰, 諸, 輩
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 16 lemmas tagged as auxiliaries (AUX): 儀, 可, 堪, 宜, 得, 應, 敢, 是, 欲, 爲, 肯, 能, 被, 見, 足, 須
- Out of the above, 14 lemmas occurred sometimes as AUX and sometimes as VERB: 可, 堪, 宜, 得, 應, 敢, 是, 欲, 爲, 肯, 被, 見, 足, 須
- There are 2 (de)verbal forms:
- Conv
- ADV: 以, 大, 因, 無, 然, 始, 當, 獨, 親, 盡
- Part
- VERB: 大, 太, 主, 寡, 皇, 小, 有, 長, 使, 以
Nominal Features
- Loc
- NOUN: 天, 下, 上, 地, 中, 先, 東, 位, 州, 西
- PROPN: 秦, 齊, 魏, 楚, 趙, 韓, 燕, 周, 宋, 晉
- Tem
- NOUN: 年, 今, 日, 後, 時, 月, 古, 初, 世, 夜
Degree and Polarity
- Cmp
- ADV: 滋, 良, 動, 愈, 益, 差, 浸, 質
- Equ
- ADP: 如
- ADV-Conv: 如, 若, 猶, 奈, 區, 恰
- VERB: 如, 若, 猶, 奈, 柰, 云, 區, 如來, 亞, 如耳
- VERB-Part: 若, 猶, 亞, 區, 如, 柰
- Pos
- ADV: 大, 然, 獨, 凡, 甚, 善, 多, 難, 誠, 久
- ADV-Conv: 大, 然, 獨, 凡, 甚, 善, 多, 難, 誠, 久
- NOUN: 良
- VERB: 大, 太, 重, 善, 明, 同, 多, 可, 然, 和
- VERB-Part: 大, 太, 寡, 皇, 小, 明, 庶, 賢, 長, 強
- Sup
- ADV: 實, 最, 頗, 愼, 已, 酷, 了, 報, 寔, 慎
- Neg
- ADV: 不, 非, 未, 弗, 莫, 無, 毋, 勿, 匪, 微
- ADV-Conv: 無, 微, 罔, 靡, 末
- VERB: 無, 微, 靡, 末, 罔
- VERB-Part: 無, 微, 末, 靡
Verbal Features
- Perf
- ADV: 已, 既, 旣, 訖
- Des
- AUX: 欲, 敢, 肯
- Nec
- AUX: 宜, 應, 須, 儀
- Pot
- AUX: 可, 能, 足, 得, 堪
- Fut
- ADV: 將, 且, 預, 倡, 更, 鄉
- Past
- ADV: 嘗, 曾, 曩, 向, 鄉
- Pres
- ADV: 方, 甫, 屬
- Pass
- AUX: 被, 見
Pronouns, Determiners, Quantifiers
- Dem
- PRON: 是, 此, 彼, 斯, 某, 他, 夫, 焉, 惟, 茲
- Int
- PRON: 何, 孰, 誰, 奚, 曷, 害, 瑕, 甚
- Prs
- PRON: 之, 其, 吾, 自, 我, 或, 子, 諸, 己, 予
- Ord
- NUM: 丁, 己, 庚申, 戊, 甲, 甲子, 己卯, 戊寅, 辛, 丁丑
- Yes
- PRON: 自, 己
- 1
- PRON: 吾, 我, 予, 余, 朕, 僕, 言
- 2
- PRON: 子, 爾, 汝, 女, 而, 若
- 3
- PRON: 之, 其, 厥
Other Features
- AdvType
- Cau
- ADV: 何, 寧, 奚, 胡, 盍, 曷, 聊
- Deg
- ADV: 實, 最, 聊, 頗, 滋, 良, 粗, 愼, 已, 酷
- Tim
- ADV: 則, 乃, 遂, 將, 已, 嘗, 既, 常, 即, 尋
- Cau
- NameType
- Geo
- PROPN: 長安, 邯鄲, 洛陽, 宜陽, 江, 漢, 晉陽, 郢, 沛, 山東
- Giv
- PROPN: 儀, 舜, 須菩提, 堯, 秦, 禹, 湯, 衍, 光, 茂
- Nat
- PROPN: 秦, 齊, 魏, 楚, 趙, 韓, 燕, 周, 宋, 晉
- Prs
- PROPN: 孔子, 孟子, 文, 武, 曾子, 昭, 宣, 惠, 太, 桓
- Sur
- PROPN: 張, 李, 王, 劉, 蘇, 陳, 韓, 趙, 田, 公孫
- Geo
- NounType
- Clf
- NOUN: 里, 尺, 方, 寸, 畝, 升, 步, 雙, 丈, 匹
- Clf
- VerbType
- Cop
- AUX: 為, 爲, 是
- Cop
Syntax
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: 爲、 是.
- This corpus uses 14 lemmas as auxiliaries (aux). Examples: 可、 能、 欲、 敢、 足、 得、 宜、 應、 被、 肯、 見、 須、 儀、 堪.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (22273)
- VERB--NOUN-ADP(之) (1417)
- VERB--NOUN-ADP(也) (15)
- VERB--NOUN-ADP(于)-ADP(之) (1)
- VERB--NOUN-ADP(兮) (9)
- VERB--NOUN-ADP(於) (2)
- VERB--NOUN-ADP(爲) (9)
- VERB--NOUN-ADP(由) (4)
- VERB--NOUN-ADP(者) (1)
- VERB--NOUN-ADP(與) (10)
- VERB--NOUN-Loc (3098)
- VERB--NOUN-Loc-ADP(之) (172)
- VERB--NOUN-Loc-ADP(也) (2)
- VERB--NOUN-Loc-ADP(爲) (1)
- VERB--NOUN-Loc-ADP(由) (1)
- VERB--NOUN-Loc-ADP(自) (1)
- VERB--NOUN-Tem (293)
- VERB--NOUN-Tem-ADP(之) (31)
- VERB--PRON (3756)
- VERB--PRON-ADP(之) (46)
- VERB--PRON-ADP(乎) (1)
- VERB--PRON-ADP(也) (1)
- VERB--PRON-ADP(爲) (1)
- VERB--PRON-ADP(與) (4)
- obj
- VERB--NOUN (28793)
- VERB--NOUN-ADP(之) (2)
- VERB--NOUN-ADP(乎) (2)
- VERB--NOUN-ADP(于) (1)
- VERB--NOUN-ADP(所) (3)
- VERB--NOUN-ADP(於) (14)
- VERB--NOUN-ADP(與) (4)
- VERB--NOUN-Loc (6724)
- VERB--NOUN-Loc-ADP(于) (2)
- VERB--NOUN-Loc-ADP(於) (3)
- VERB--NOUN-Tem (969)
- VERB--NOUN-Tem-ADP(爲) (1)
- VERB--PRON (7747)
- VERB--PRON-ADP(與) (1)
- iobj
- VERB--NOUN (390)
- VERB--NOUN-ADP(之) (6)
- VERB--NOUN-Loc (37)
- VERB--NOUN-Loc-ADP(之) (1)
- VERB--NOUN-Loc-ADP(於) (1)
- VERB--NOUN-Tem (3)
- VERB--PRON (644)
Verbs with Reflexive Core Objects
- This corpus contains 190 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: 以 自、 殺 自、 稱 自、 剄 自、 曰 自、 知 己、 稱 己、 安 自、 求 己、 以 己、 使 己、 救 己、 正 己、 脩 己、 行 己、 謂 自、 使 自、 信 自、 加 己、 即 己、 娛 自、 歸 己、 焚 自、 議 己、 責 自、 事 己、 令 自、 伐 自、 克 己、 刎 自、 刺 自、 勝 己、 危 自、 反 己、 吠 己、 在 己、 如 己、 存 自、 怨 自、 恣 自、 成 己、 戴 己、 投 自、 暴 自、 有 自、 枉 己、 棄 自、 樂 自、 殺 己、 由 己
Relations Overview
- This corpus uses 10 relation subtypes: compound:redup, csubj:outer, csubj:pass, discourse:sp, flat:foreign, flat:vv, nsubj:outer, nsubj:pass, obl:lmod, obl:tmod
- The following 4 relation types are not used in this corpus at all: goeswith, reparandum, punct, dep