Using the Historical Thesaurus of English as a dataset, which dates back nearly 1,000 years to the period of Old English, researchers at the University of California, Berkeley, Lehigh University and the University of Toronto have developed an algorithm to demonstrate how words evolve.
“If you look at the history of a word, meanings of that word tend to shift or extend over time,” says Yang Xu, an assistant professor in the department of computer science and University College’s cognitive science program. “The question is: Why is this happening? How is it happening? And whether there are computational algorithms we can leverage to make predictions about the historical development of word meanings.”
The researchers published their findings on Feb. 19 in the Proceedings of the National Academy of Sciences (PNAS).
As Xu explains, the word “face” was used to reference a body part. It then extended to include senses of facial expressions, such as “smiley face” or “funny face”. Later, the meaning covers novel senses such as the surface of an object, “face of a table” or “face of a cube”. But it doesn’t end there. “Face” can be used in remote context, such as “face danger” or “face risks” – resulting in a web or network of meanings.
“The [algorithm’s] prediction is that a word should connect closely to related meanings in the space available – similar to finding nearest neighbours in semantic space – resulting in a chain that efficiently links novel meanings to the existing meanings of a word.”
Interested in publicly funded research in Canada? Learn more at U of T’s #supportthereport advocacy campaign
“[What] we didn't know from the past, is how this chaining process can be implemented computationally and tested at a broad scale.”
Xu says an ongoing study will explore the basis of chaining across a diverse array of languages, to see whether it can explain some recurring patterns, like why do many languages use the same word to describe “fire” and “flame”, and to leverage current digital resources to predict word usage over time. The work could have further implications in the area of natural language processing, training computers to understand novel word usage accurately.
“A potential research direction is the machine interpretation of novel word usages such as those in non-literal expressions,” says Xu. “If I say 'grasp’ I can refer to ‘grasping an object’ versus ‘grasping an idea’. Humans understand this usage fairly quickly, even though it appears novel. Understanding these phenomena would require the development of computational tools that would go beyond the algorithms of chaining.”
U of T’s undergraduate cognitive science program, sponsored by University College, merges cross-disciplinary studies in computer science, linguistics, philosophy, psychology and neuroscience, and challenges students on questions of the human mind and its application to machines. Xu, whose earlier graduate work looked at machine learning applications in cognitive neuroscience, is the first joint research appointment, strengthening ties between computer and cognitive sciences.
He joins core faculty Ana Pérez-Leroux, professor of Spanish and linguistics and the program’s director; John Vervaeke, a lecturer in the department of psychology, and Assistant Professor James John of the department of philosophy. Xu plans to teach a course on data science in the cognitive sciences.
Future research questions for Xu include how children learn language.
“I'm curious about the parallels between historical language change and child language acquisition,” says Xu. “For example, a child might use ‘bus’ to refer to a lot of things that move on the road, so I can imagine some sort of chaining mechanism going on there.”
“Whether that's necessarily the same as chaining observed in historical language change, we don't know. But I think it opens up new questions that we can explore.”