Volume 13 Number 4
Reading the Quan Tang shi: Literary History, Topic Modeling, Divergence Measures
Abstract
The present paper addresses the problem of literary history as a problem of data comprehensiveness and selection, seeking not to resolve the impossibility of literary historical narrative, but to reframe it through a computational perspective. Our focus is on the Quan Tang shi 全唐詩 (Complete Tang poetry), the massive comprehensive anthology of Tang poetry that was produced at the height of the Qing dynasty (1644–1912). The sheer quantity of Tang poetry preserved in the Quan Tang shi (over 50,000 poems and poem fragments) exceeds the human-scale perspectives of close reading. To make sense of the corpus as a whole, we will show how two related forms of distant reading — topic modeling and divergence measures — allow us to reframe and rethink these literary historical questions and provide a new perspective on what it means to read Tang poetry.
I want us to see how impossible it is not to move between these poles when trying to construct literary arguments that operate at a certain level of scale (although when this shift occurs remains unclear). In particular, I want us to see the necessary integration of qualitative and quantitative reasoning, which, as I will try to show, has a fundamentally circular and therefore hermeneutic nature. [Piper 2015, 69]
I. What Do We Do When We Write Literary History?
Perkins goes on to ask “how much incompleteness is acceptable” and to note how the fact that “a literary history must be written from a point of view” — the limited perspective of the critic — inflects the explanatory argument of the narrative [Perkins 1992, 13]. These are problems that underlie all narrative historiography, since the nature of narrative historical argument involves simplification by means of representationality. That is, certain data are selected to serve as representative of the whole, and such choices are always informed by value judgments and critical interventions.Whatever else they have also hoped to accomplish, all literary historians have sought to represent the past and to explain it. To represent it is to tell how it was and to explain it is to state why — why literary works acquired the character they have and why the literary series evolved as it did. … Of course representation and explanation can never be complete, as literary historians and theorists have always recognized. Even if a historian knew all the relevant facts and answers, he could not crowd them into a book. The only complete literary history would be the past itself, but this would not be a history, because it would not be interpretive and explanatory.
Neither Wellek and Warren nor Perkins are arguing against historiographic simplification, since literary histories, much like maps, provide us with signposts by which to navigate the practically boundless territories of literary data. Indeed, one might go even further: We rely on literary history to tell us what to read, to tell us what counts as literature — in short, to demarcate the very boundaries of the vast and otherwise amorphous realm of textuality. The map is not and should not be the territory if the map is to be of any use. That said, there is still a problem of how one makes the initial selections. To allocate space in the historical narrative to a particular author means that other authors must be neglected, or simply reduced to passing mention; at some point the linearity of the narrative will require that data be excluded or discussed in aggregate; and this refracts the data in ways that can be obscured through the graceful telling of literary history. [2]There are simply no data in literary history which are completely neutral “facts”. Value judgements are implied in the very choice of materials: in the simple preliminary distinction between books and literature, in the mere allocation of space to this or that author. Even if we grant that there are facts comparatively neutral, facts such as dates, titles, biographical events, we merely grant the possibility of compiling the annals of literature. But any question a little more advanced, even a question of textual criticism or of sources and influences, requires constant acts of judgement. [Wellek and Warren 1956, 40]
II. What Is the QTS?
Chapters | Contents |
1–9 | Poems by emperors, empresses and imperial members |
10–16 | Poems relating to rituals and sacrifices |
17–29 | Music Bureau poetry |
30–731 | Poems by individual poets of the Tang dynasty |
732–733 | Poems by dynastic villains and rebels |
734–766 | Poems by individual poets of the Five Dynasties Period |
767–784 | Poems by poets with partial biographical information |
785–787 | Poems without authorial attribution |
788–794 | Linked verse poems (lianju 聯句) |
795 | Incomplete poems and lines by poets not listed above |
796 | Incomplete poems and lines without authorial attribution |
797–805 | Poems by women |
806–851 | Poems by Buddhist figures |
852–859 | Poems by Daoist figures |
860–862 | Poems by (male) immortals (xian 仙) |
863 | Poems by female immortals (nüxian 女仙) |
864 | Poems by spirits (shen 神) |
865–866 | Poems by ghosts (gui 鬼) |
867 | Poems by anomalies (guai 怪) |
868 | Poems sent in dreams (meng 夢) |
869–872 | Jest and insult poems (xienüe 諧謔) |
873 | Poems inscribed on walls (tiyu 提語) and judgments (pan 判) |
874 | Songs (ge 歌) sung by groups or local communities |
875 | Prophetic verse (chenji 讖記) |
876 | Sayings in verse form (yu 語) |
877 | Proverbs and enigmatic verse (yanmi 諺謎) |
878 | Prescient popular ditties (yao 謠) |
879 | Drinking songs (jiuling 酒令) |
880 | Divination songs (zhanci 占辭) |
881 | The Mengqiu 蒙求 primer by Li Han 李瀚 |
882–888 | Poems left out of previous sections (buyi 補遺) |
889–900 | Song-lyrics (ci 詞) |
III. On Topic Modeling as Literary Methodology
It is important to note that Ramsay also argues (elsewhere in his work) that computer-aided reading, which he calls “algorithmic criticism”, is not a radical break from traditional modes of literary reading, since the human critic’s acts of interpretation can also be said to depend on rule-based processes — which is to say, upon algorithms — even though the critic’s manipulations and deformations of text may not be as explicit or programmatic as those of the computer. And to return to Hayles’ point, the major difference that can be realized through the computer’s algorithmic reading is simply that of the object of study, which is no longer confined to the single text or a succession of single texts, but a macroscalar aggregation of texts.It is one thing to notice patterns of vocabulary, variations in line length, or images of darkness and light; it is another thing to employ a machine that can unerringly discover every instance of such features across a massive corpus of literary texts and then present those features in a visual format entirely foreign to the original organization in which these features appear. Or rather, it is the same thing at a different scale and with expanded powers of observation. It is in such results that the critic seeks not facts, but patterns. And from pattern the critic may move to the grander rhetorical formations that constitute critical reading. [Ramsay 2011, 17]
[Blei 2012, 79]…the goal of topic modeling is to automatically discover the topics from a collection of documents. The documents themselves are observed, while the topic structure — the topics, per-document topic distributions, and the per-document per-word topic assignments — is hidden structure. The central computational problem for topic modeling is to use the observed documents to infer the hidden topic structure. This can be thought of as “reversing” the generative process — what is the hidden structure that likely generated the observed collection?
IV. The QTS and Topic Modeling
- 馬 車 騎 行 塵 出 長 鞭 門 走 白 嘶 金 蹄 道 鞍 青 黃 馳 驅
- 酒 醉 杯 飲 一 客 酌 醒 勸 滿 尊 傾 對 笑 壺 歡 酣 送 倒 且
- 不 人 能 知 誰 生 自 有 無 來 豈 得 在 與 作 但 令 可 古 解
- 相 年 家 見 弟 同 兄 來 逢 長 少 還 自 許 多 喜 說 親 作 時
- 江 海 迢 潮 帆 吳 越 楚 孤 滄 客 去 洲 波 歸 遠 湖 浪 遞 南
- 寂 遙 寥 空 落 寞 朝 風 獨 見 中 思 林 月 清 在 寒 逍 想 招
- 夜 月 明 燈 曉 殘 漏 星 聲 更 照 暗 寒 宿 燭 火 半 光 露 鐘
- 我 不 為 此 言 爾 者 何 有 亦 如 生 人 與 得 苦 無 所 願 令
- 水 東 流 西 山 日 去 雲 空 落 歸 中 路 陵 白 復 見 長 暮 盡
- 金 玉 珠 銀 紫 錦 重 寶 黃 光 刀 環 裁 佩 衣 雙 盤 珊 龍 垂
- 蕭 風 雨 秋 暮 條 寒 雲 吹 起 獨 颯 晚 上 日 樹 葉 空 向 索
- 馬 車 騎 行 塵 出 長 鞭 門 走 白 嘶 金 蹄 道 鞍 青 黃 馳 驅 horses, traveling
- 酒 醉 杯 飲 一 客 酌 醒 勸 滿 尊 傾 對 笑 壺 歡 酣 送 倒 且 drinking ale
- 不 人 能 知 誰 生 自 有 無 來 豈 得 在 與 作 但 令 可 古 解 particles, common verbs
- 相 年 家 見 弟 同 兄 來 逢 長 少 還 自 許 多 喜 說 親 作 時 families, home
- 江 海 迢 潮 帆 吳 越 楚 孤 滄 客 去 洲 波 歸 遠 湖 浪 遞 南 southern waterscapes
- 寂 遙 寥 空 落 寞 朝 風 獨 見 中 思 林 月 清 在 寒 逍 想 招 remoteness and longing
- 夜 月 明 燈 曉 殘 漏 星 聲 更 照 暗 寒 宿 燭 火 半 光 露 鐘 night: moon and lamp
- 我 不 為 此 言 爾 者 何 有 亦 如 生 人 與 得 苦 無 所 願 令 particles, pronouns
- 水 東 流 西 山 日 去 雲 空 落 歸 中 路 陵 白 復 見 長 暮 盡 landscape: traveler
- 金 玉 珠 銀 紫 錦 重 寶 黃 光 刀 環 裁 佩 衣 雙 盤 珊 龍 垂 precious materials
- 蕭 風 雨 秋 暮 條 寒 雲 吹 起 獨 颯 晚 上 日 樹 葉 空 向 索 autumn evening scene
Word | Word Weight | Count |
馬 | 0.16849291492169124 | 2937 |
車 | 0.049968447019677585 | 871 |
騎 | 0.03568355229189375 | 622 |
行 | 0.031897194653204064 | 556 |
塵 | 0.03172508748780908 | 553 |
出 | 0.030577706385175835 | 533 |
長 | 0.027594515518329414 | 481 |
鞭 | 0.02702082496701279 | 471 |
門 | 0.025299753313062934 | 441 |
走 | 0.020595490792266653 | 359 |
白 | 0.018358097642131834 | 320 |
嘶 | 0.016694395043313638 | 291 |
金 | 0.016120704491997016 | 281 |
蹄 | 0.014973323389363778 | 261 |
道 | 0.014629109058573805 | 255 |
鞍 | 0.01359646606620389 | 237 |
青 | 0.013424358900808904 | 234 |
黃 | 0.012047501577649016 | 210 |
馳 | 0.011990132522517355 | 209 |
驅 | 0.01187539441225403 | 207 |
Phrase | Phrase Weight | Count |
車 馬 | 0.0615748963883955 | 104 |
馬 蹄 | 0.04262877442273535 | 72 |
走 馬 | 0.04085257548845471 | 69 |
馬 嘶 | 0.04026050917702783 | 68 |
驄 馬 | 0.03197158081705151 | 54 |
騎 馬 | 0.028419182948490232 | 48 |
鞍 馬 | 0.02427471876850207 | 41 |
駟 馬 | 0.020130254588513915 | 34 |
匹 馬 | 0.018354055654233273 | 31 |
駿 馬 | 0.017761989342806393 | 30 |
駑 駘 | 0.017169923031379514 | 29 |
躞 蹀 | 0.013025458851391355 | 22 |
匆 匆 | 0.012433392539964476 | 21 |
驅 馬 | 0.011841326228537596 | 20 |
騏 驥 | 0.010657193605683837 | 18 |
羸 馬 | 0.010065127294256957 | 17 |
車 騎 | 0.009473060982830076 | 16 |
馬 鞭 | 0.0076968620485494375 | 13 |
騏 驎 | 0.0076968620485494375 | 13 |
白 馬 | 0.0076968620485494375 | 13 |
嘶 馬 | 0.007104795737122558 | 12 |
TopicID | Percentage | Topic (first 20 terms in rank order) | Topic Label |
30 | 10.952% | 天 大 皇 四 帝 海 萬 太 功 夷 方 王 三 命 元 聖 乾 業 宗 武 | empire and sovereign power |
16 | 5.952% | 不 如 食 生 足 骨 口 死 飢 肉 眼 齒 腹 為 土 耳 力 飲 可 破 | mortality, eating and drinking |
139 | 5.238% | 塵 人 春 新 身 親 勤 鄰 殷 頻 津 為 巾 輪 濱 四 辰 辛 貧 巡 | tsyen 真 rhyme category |
7 | 3.81% | 我 不 為 此 言 爾 者 何 有 亦 如 生 人 與 得 苦 無 所 願 令 | pronouns and particles |
135 | 3.095% | 將 軍 兵 戰 旗 馬 旌 功 劍 弓 戎 營 騎 鼓 戈 羽 角 虜 射 箭 | military, armies |
110 | 3.095% | 未 自 慚 知 豈 雖 薄 終 非 已 顧 愧 負 辭 寧 猶 難 甘 心 敢 | shame, regret |
32 | 3.095% | 十 三 年 二 五 四 六 七 一 八 百 千 九 月 今 歲 前 載 老 第 | numbers, calendrical time |
15 | 3.095% | 王 漢 秦 國 宮 吳 陵 帝 楚 武 臺 陽 長 子 安 亡 苑 作 蘇 梁 | Han and pre-Han kingdoms |
49 | 2.381% | 南 北 東 西 山 風 望 國 城 日 向 斗 河 闕 起 海 長 直 復 從 | territorial space, cardinal directions |
42 | 2.381% | 庭 中 滿 洞 月 山 風 樹 裏 長 高 入 空 雲 水 煙 起 日 上 陽 | courtyard / landscape scene? |
125 | 1.667% | 門 深 開 日 閉 高 戶 滿 入 客 出 堂 朱 外 巷 院 掩 牆 庭 靜 | gates, halls, and built spaces |
124 | 1.667% | 所 言 子 豈 道 志 異 為 良 非 懷 自 徒 昔 貞 乃 可 賢 常 義 | particles / will and righteousness? |
122 | 1.667% | 魚 水 釣 下 垂 鳥 池 上 時 小 鱗 竿 驚 得 網 有 無 欲 避 坐 | fisherman, fishing |
115 | 1.667% | 劍 士 氣 生 平 感 侯 壯 橫 長 縱 交 雄 將 子 英 安 報 燕 節 | heroes, men of valor |
109 | 1.667% | 江 湘 南 楚 水 客 舟 遠 雲 孤 秋 山 瀟 雁 陽 浦 湖 去 渚 夢 | southern (Chu) riverscapes |
98 | 1.667% | 神 禮 德 惟 肅 樂 靈 降 薦 既 誠 載 以 明 昭 永 斯 福 陳 備 | ritual and ceremony |
92 | 1.667% | 龍 雷 騰 神 虎 如 驚 蛟 若 蛇 電 橫 當 鼓 倒 大 氣 怪 勢 天 | dragons, tigers, images of divine power |
87 | 1.667% | 然 化 物 無 浩 天 心 異 造 有 形 方 道 可 忽 中 變 至 窮 言 | divine creation and transformation |
83 | 1.667% | 古 草 荒 人 空 在 野 木 平 跡 遺 地 舊 原 猶 城 餘 有 無 蕪 | desolate wilderness |
66 | 1.667% | 遊 留 休 頭 秋 侯 州 收 憂 求 愁 不 丘 浮 酬 裘 子 諸 牛 羞 | ghou 尤 rhyme category |
36 | 1.667% | 天 下 太 地 子 中 一 上 海 出 道 入 平 黃 九 四 守 白 大 成 | empire and territory |
V. Using Topics to Think through Literary History
Document ID | Poem Title and Author | English Translation | Similarity Score |
1969 (30_1) | 《詠漢高祖》王珪 | “On Han Gaozu” by Wang Gui | 1.0 |
573 (13_68) | 《享太廟樂章。象德舞》 段文昌 | “Hymns for the Offering to the Ancestral Temple: Dance for Manifesting Virtue” by Duan Wenchang | 0.8822478125653975 |
38884 (767_29) | 《苻堅投棰》 孫元晏 | “Fu Jian Casts His Whip” by Sun Yuan’an | 0.8787439416072281 |
2751 (52_30) | 《過函谷關》 宋之問 | “Visiting Hangu Pass” by Song Zhiwen | 0.877820744700559 |
37314 (729_66) | 《二廢帝》 周曇 | “On the Two Deposed Emperors” by Zhou Tan | 0.8722429641594189 |
2009 (31_26) | 《梁郊祀樂章。慶休》 ? | “Hymns for the Liang Suburban Sacrifice: Jubilation” by unknown author | 0.872006203906182 |
TopicID | Percentage | Topic (first 20 terms in rank order) | Topic Label |
30 | 10.163% | 天 大 皇 四 帝 海 萬 太 功 夷 方 王 三 命 元 聖 乾 業 宗 武 | empire and sovereign power |
87 | 5.285% | 然 化 物 無 浩 天 心 異 造 有 形 方 道 可 忽 中 變 至 窮 言 | divine creation and transformation |
98 | 4.065% | 神 禮 德 惟 肅 樂 靈 降 薦 既 誠 載 以 明 昭 永 斯 福 陳 備 | ritual and ceremony |
32 | 2.846% | 十 三 年 二 五 四 六 七 一 八 百 千 九 月 今 歲 前 載 老 第 | numbers, calendrical time |
149 | 1.626% | 忘 隱 清 心 閒 興 林 靜 勝 自 外 幽 景 愛 機 塵 坐 情 對 境 | reclusion |
131 | 1.626% | 朝 恩 主 重 詔 拜 承 明 從 臣 榮 紫 賜 門 闕 寵 官 命 列 禁 | imperial court |
126 | 1.626% | 多 過 更 深 遠 近 宜 入 偏 地 好 和 處 隨 經 移 數 重 客 高 | visiting someone |
113 | 1.626% | 惡 死 禍 者 敢 罪 危 狼 殺 受 非 防 虎 力 反 利 失 亂 命 傷 | disaster |
111 | 1.626% | 應 出 隨 朝 從 還 臨 先 行 分 迎 暫 逐 近 節 遠 待 將 旌 方 | welcoming / serving? |
109 | 1.626% | 江 湘 南 楚 水 客 舟 遠 雲 孤 秋 山 瀟 雁 陽 浦 湖 去 渚 夢 | southern (Chu) riverscapes |
108 | 1.626% | 復 念 歲 已 懷 所 窮 忽 歎 何 良 憂 未 當 豈 夕 役 終 思 歡 | remembering, mourning |
83 | 1.626% | 古 草 荒 人 空 在 野 木 平 跡 遺 地 舊 原 猶 城 餘 有 無 蕪 | desolate wilderness |
72 | 1.626% | 山 石 松 巖 溪 泉 林 幽 雲 鳥 谷 蘿 深 澗 隱 青 徑 野 下 竹 | mountain scene |
40 | 1.626% | 天 人 地 上 不 生 下 此 長 一 道 得 日 為 間 何 高 意 擾 白 | Heaven and Earth |
36 | 1.626% | 天 下 太 地 子 中 一 上 海 出 道 入 平 黃 九 四 守 白 大 成 | empire and territory |
15 | 1.626% | 王 漢 秦 國 宮 吳 陵 帝 楚 武 臺 陽 長 子 安 亡 苑 作 蘇 梁 | Han and pre-Han kingdoms |