Tokenization and Word Segmentation
Tokenization and Word Segmentation
- This corpus contains 86239 sentences and 433168 tokens.
- This corpus contains 100 sentences and 648 tokens.
- This corpus contains 428559 tokens (99%) that are not followed by a space.
- This corpus contains 648 tokens (100%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus does not contain words with spaces.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus does not contain words that contain both letters and punctuation.
- This corpus uses 14 UPOS tags out of 17 possible: ADP, ADV, AUX, CCONJ, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB
- This corpus does not use the following tags: ADJ, DET, X
- This corpus uses 13 UPOS tags out of 17 possible: ADJ, ADP, ADV, AUX, CCONJ, DET, NOUN, NUM, PART, PRON, PROPN, SCONJ, VERB
- This corpus does not use the following tags: INTJ, SYM, PUNCT, X
- This corpus contains 45 word types tagged as particles (PART): 不, 乎, 也, 于, 云, 些, 以, 來, 侯, 兮, 其, 只, 哉, 夫, 如, 子, 寧, 居, 已, 思, 所, 攸, 斯, 歟, 止, 焉, 然, 爾, 甫, 疇, 矣, 者, 而, 耳, 耶, 聿, 與, 若, 蓋, 記, 諸, 載, 逝, 邪, 馨
- This corpus contains 12 word types tagged as particles (PART): 乎, 也, 哉, 夫, 已, 所, 焉, 然, 矣, 者, 而, 邪
- This corpus contains 53 lemmas tagged as pronouns (PRON): 乃, 之, 予, 云, 他, 伊, 何, 佗, 余, 僕, 公, 其, 卬, 厥, 吾, 夫, 奚, 女, 子, 孤, 孰, 它, 安, 害, 己, 彼, 惟, 我, 或, 按, 斯, 是, 時, 曷, 朕, 某, 此, 汝, 焉, 爰, 爾, 瑕, 甚, 維, 而, 自, 若, 茲, 言, 許, 誰, 諸, 輩
- This corpus contains 13 lemmas tagged as pronouns (PRON): _, 之, 其, 奚, 己, 彼, 惡, 我, 斯, 是, 此, 焉, 自
- This corpus contains 0 lemmas tagged as determiners (DET):
- This corpus contains 2 lemmas tagged as determiners (DET): 之, 數
- Out of the above, 1 lemmas occurred sometimes as PRON and sometimes as DET: 之
- This corpus contains 16 lemmas tagged as auxiliaries (AUX): 儀, 可, 堪, 宜, 得, 應, 敢, 是, 欲, 爲, 肯, 能, 被, 見, 足, 須
- This corpus contains 1 lemmas tagged as auxiliaries (AUX): 爲
- Out of the above, 14 lemmas occurred sometimes as AUX and sometimes as VERB: 可, 堪, 宜, 得, 應, 敢, 是, 欲, 爲, 肯, 被, 見, 足, 須
- Out of the above, 1 lemmas occurred sometimes as AUX and sometimes as VERB: 爲
- This corpus does not use the VerbForm feature.
- Conv
- ADV: 以, 大, 因, 無, 然, 始, 當, 獨, 親, 盡
- Part
- VERB: 大, 太, 主, 寡, 皇, 小, 有, 長, 使, 以
Nominal Features
Nominal Features
- Loc
- NOUN: 天, 下, 上, 地, 中, 先, 東, 位, 州, 西
- PROPN: 秦, 齊, 魏, 楚, 趙, 韓, 燕, 周, 宋, 晉
- Loc
- NOUN: 天, 南, 上, 世, 下, 北, 地, 池, 海, 內
- PROPN: 楚
- Tem
- NOUN: 年, 今, 日, 後, 時, 月, 古, 初, 世, 夜
- Tem
- NOUN: 歲, 年, 後, 春, 秋, 今, 月, 古, 日, 時
Degree and Polarity
Degree and Polarity
- Cmp
- ADV: 滋, 良, 動, 愈, 益, 差, 浸, 質
- Equ
- ADP: 如
- ADV-Conv: 如, 若, 猶, 奈, 區, 恰
- VERB: 如, 若, 猶, 奈, 柰, 云, 區, 如來, 亞, 如耳
- VERB-Part: 若, 猶, 亞, 區, 如, 柰
- Pos
- ADV: 大, 然, 獨, 凡, 甚, 善, 多, 難, 誠, 久
- ADV-Conv: 大, 然, 獨, 凡, 甚, 善, 多, 難, 誠, 久
- NOUN: 良
- VERB: 大, 太, 重, 善, 明, 同, 多, 可, 然, 和
- VERB-Part: 大, 太, 寡, 皇, 小, 明, 庶, 賢, 長, 強
- Pos
- NOUN: 冥, 廣, 怪, 正
- VERB: 冥, 厚, 大, 數, 然, 窮, 久, 匹, 太, 夭
- Sup
- ADV: 實, 最, 頗, 愼, 已, 酷, 了, 報, 寔, 慎
- Neg
- ADV: 不, 非, 未, 弗, 莫, 無, 毋, 勿, 匪, 微
- ADV-Conv: 無, 微, 罔, 靡, 末
- VERB: 無, 微, 靡, 末, 罔
- VERB-Part: 無, 微, 末, 靡
- Neg
- ADV: 不, 未, 无, 莫
- VERB: 无, 非
Verbal Features
Verbal Features
Pronouns, Determiners, Quantifiers
Pronouns, Determiners, Quantifiers
- Dem
- PRON: 是, 此, 彼, 斯, 某, 他, 夫, 焉, 惟, 茲
- Int
- PRON: 何, 孰, 誰, 奚, 曷, 害, 瑕, 甚
- Prs
- PRON: 之, 其, 吾, 自, 我, 或, 子, 諸, 己, 予
- Ord
- NUM: 丁, 己, 庚申, 戊, 甲, 甲子, 己卯, 戊寅, 辛, 丁丑
- 1
- PRON: 吾, 我, 予, 余, 朕, 僕, 言
Other Features
Other Features
- AdvType
- Cau
- Deg
- ADV: 實, 最, 聊, 頗, 滋, 良, 粗, 愼, 已, 酷
- Tim
- ADV: 則, 乃, 遂, 將, 已, 嘗, 既, 常, 即, 尋
- NameType
- Geo
- PROPN: 長安, 邯鄲, 洛陽, 宜陽, 江, 漢, 晉陽, 郢, 沛, 山東
- Giv
- PROPN: 儀, 舜, 須菩提, 堯, 秦, 禹, 湯, 衍, 光, 茂
- Nat
- PROPN: 秦, 齊, 魏, 楚, 趙, 韓, 燕, 周, 宋, 晉
- Prs
- PROPN: 孔子, 孟子, 文, 武, 曾子, 昭, 宣, 惠, 太, 桓
- Sur
- PROPN: 張, 李, 王, 劉, 蘇, 陳, 韓, 趙, 田, 公孫
- NounType
- Clf
- NOUN: 里, 尺, 方, 寸, 畝, 升, 步, 雙, 丈, 匹
Auxiliary Verbs and Copula
- This corpus uses 2 lemmas as copulas (cop). Examples: 爲、 是.
Auxiliary Verbs and Copula
- This corpus uses 1 lemmas as copulas (cop). Examples: 爲.
- This corpus uses 14 lemmas as auxiliaries (aux). Examples: 可、 能、 欲、 敢、 足、 得、 宜、 應、 被、 肯、 見、 須、 儀、 堪.
- This corpus does not contain auxiliaries.
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (22273)
- VERB--NOUN-ADP(之) (1417)
- VERB--NOUN-ADP(也) (15)
- VERB--NOUN-ADP(于)-ADP(之) (1)
- VERB--NOUN-ADP(兮) (9)
- VERB--NOUN-ADP(於) (2)
- VERB--NOUN-ADP(爲) (9)
- VERB--NOUN-ADP(由) (4)
- VERB--NOUN-ADP(者) (1)
- VERB--NOUN-ADP(與) (10)
- VERB--NOUN-Loc (3098)
- VERB--NOUN-Loc-ADP(之) (172)
- VERB--NOUN-Loc-ADP(也) (2)
- VERB--NOUN-Loc-ADP(爲) (1)
- VERB--NOUN-Loc-ADP(由) (1)
- VERB--NOUN-Loc-ADP(自) (1)
- VERB--NOUN-Tem (293)
- VERB--NOUN-Tem-ADP(之) (31)
- VERB--PRON (3756)
- VERB--PRON-ADP(之) (46)
- VERB--PRON-ADP(乎) (1)
- VERB--PRON-ADP(也) (1)
- VERB--PRON-ADP(爲) (1)
- VERB--PRON-ADP(與) (4)
Core Arguments, Oblique Arguments and Adjuncts
Here we consider only relations between verbs (parent) and nouns or pronouns (child).
- nsubj
- VERB--NOUN (20)
- VERB--NOUN-ADP(之) (2)
- VERB--NOUN-ADP(也) (2)
- VERB--NOUN-Loc (4)
- VERB--NOUN-Tem (1)
- VERB--PRON (14)
- obj
- VERB--NOUN (28793)
- VERB--NOUN-ADP(之) (2)
- VERB--NOUN-ADP(乎) (2)
- VERB--NOUN-ADP(于) (1)
- VERB--NOUN-ADP(所) (3)
- VERB--NOUN-ADP(於) (14)
- VERB--NOUN-ADP(與) (4)
- VERB--NOUN-Loc (6724)
- VERB--NOUN-Loc-ADP(于) (2)
- VERB--NOUN-Loc-ADP(於) (3)
- VERB--NOUN-Tem (969)
- VERB--NOUN-Tem-ADP(爲) (1)
- VERB--PRON (7747)
- VERB--PRON-ADP(與) (1)
- obj
- VERB--NOUN (38)
- VERB--NOUN-Loc (7)
- VERB--NOUN-Tem (5)
- VERB--PRON (12)
- VERB--PRON-ADP(以) (1)
- iobj
- VERB--NOUN (390)
- VERB--NOUN-ADP(之) (6)
- VERB--NOUN-Loc (37)
- VERB--NOUN-Loc-ADP(之) (1)
- VERB--NOUN-Loc-ADP(於) (1)
- VERB--NOUN-Tem (3)
- VERB--PRON (644)
Verbs with Reflexive Core Objects
- This corpus contains 190 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: 以 自、 殺 自、 稱 自、 剄 自、 曰 自、 知 己、 稱 己、 安 自、 求 己、 以 己、 使 己、 救 己、 正 己、 脩 己、 行 己、 謂 自、 使 自、 信 自、 加 己、 即 己、 娛 自、 歸 己、 焚 自、 議 己、 責 自、 事 己、 令 自、 伐 自、 克 己、 刎 自、 刺 自、 勝 己、 危 自、 反 己、 吠 己、 在 己、 如 己、 存 自、 怨 自、 恣 自、 成 己、 戴 己、 投 自、 暴 自、 有 自、 枉 己、 棄 自、 樂 自、 殺 己、 由 己
Verbs with Reflexive Core Objects
- This corpus contains 2 lemmas that occur at least once with a reflexive core object (obj or iobj). Examples: 無 己、 視 自
Relations Overview
- This corpus uses 10 relation subtypes: compound:redup, csubj:outer, csubj:pass, discourse:sp, flat:foreign, flat:vv, nsubj:outer, nsubj:pass, obl:lmod, obl:tmod
- The following 4 relation types are not used in this corpus at all: goeswith, reparandum, punct, dep
Relations Overview
- This corpus uses 4 relation subtypes: discourse:sp, flat:vv, obl:lmod, obl:tmod
- The following 11 relation types are not used in this corpus at all: xcomp, vocative, expl, aux, appos, list, orphan, goeswith, reparandum, punct, dep