U of T news

Computer science could shed new light on dark poem

Literary stylistic analysis gets boost from technology

Course instructor Adam Hammond (left) enlisted the help of computer science teaching assistant Julian Brooke to make computerized stylistic analysis possible. (Photo by Diana Tyszko)

Unique research that brings technology to bear on T.S. Eliot’s famous poem, The Waste Land, promises to reveal insights that human readers may not find on their own and undergraduates are part of the project.  

The students are in a new course on The Digital Text, designed and taught by Department of English course instructor Adam Hammond. Created with the support of the Faculty of Arts and Science Curriculum Renewal Initiatives Fund, the course enables students to explore many aspects of digitization and literature, including computer-assisted literary stylistic analysis, a relatively new, but extremely promising area of scholarship.

Computers bring huge benefits to stylistic analysis because they are better than humans at certain things, Hammond explains. “Humans are incredible at reading. They can look at marks on a page and almost instantly tell you what these marks mean. With a bit more time, they recognize patterns of sound and symbolism, spot complex connections between texts and unpack themes. Computers are terrible readers by comparison. They have trouble telling a pronoun from a verb and they’re hopeless at insights into complex things like rhyme scheme.” But the computer’s edge on humans: they read quickly, are great at counting and they don't get bored.

By way of example, Hammond cites the type/token ratio, which is calculated by dividing the number of unique words in a text by the total number of words to measure vocabulary diversity. A human might be able to manually calculate the type/token ratio of one page of prose: count the total number of words, the unique words and ignore repetitions, but anything more than a page would be mind-numbing, time-consuming, and probably inaccurate. A computer can give an accurate ratio for an electronic text of any size in seconds.
Hammond asked Julian Brooke, the computer science TA for The Digital Text whose research is on computational stylistics, to develop an algorithm to identify the distinct voices in The Waste Land.  Brooke was keen but needed an electronic version to get started.  Hammond created one, manually speech-tagging the entire poem himself to tell the computer the exact part of speech of every individual word, a process he describes as “immensely boring and labour-intensive.” 

Equipped with the file, the computer does things that no human would have the patience to do: it can tell the average number of pronouns per line and compare stanzas; identify the wordiest parts of the poem (in quantitative terms, those with the highest type/token ratios) and so on.

“These boring statistics are the starting point for recognizing different vocal patterns in the poem,” said Hammond.

Students helped the research by marking up text to tell the computer things about specific words. Each student in the class tagged one echo, repetition, quotation, onomatopoeia, and so on. Brooke now has a file with 200 tags, and is able to include new questions in his research such as: Does this voice return in a later part of the poem? The analysis is still in its early stages, but Brooke has found this work has given his research a productive new facet that he had not previously explored.

So what would T.S. Eliot think of all this technological examination of his work?

“We know Eliot today mostly through his poems of the 1920s like The Waste Land. But Eliot was quite desperate to stop writing poetry and to become a dramatist.  In the mid-1930s, he finally made the jump, and wrote almost exclusively in dramatic forms from then on. Also, the working title of The Waste Land was He Do the Police in Different Voices. So I think he'd be happy that we are treating his poem somewhat like a play, and -- most of all -- happy that we were encouraging other readers to look at The Waste Land as a tangle of voices.”

Visit the Digital Text website to see student annotations and some hypotheses about voice distribution