Nicholas Zammit of the University of Toronto Mississauga is aiming to shed light on Canada's economic history circa the First World War by drawing on mountains of government data including everything from labour and steel to beer kegs imports.
For years, Zammit had been manually entering trade volume data into spreadsheets, a format that allowed him to do an economic analysis. It was a process so cumbersome he estimates it would have taken one researcher more than 50 years to process a small segment of his sample data.
“We’ve got the price and quantity of every good traded between Canada and every other country,” said Zammit, an assistant professor, teaching stream in U of T Mississauga's department of economics. “But it’s a very big dataset with a lot of data points.”
Then Dev'Roux Maharaj came along. The undergraduate student in economics and political science, who worked part-time at Amazon's Mississauga operations, helped adapt a web-based machine learning tool developed by the online retail giant for its business clients to help with Zammit's research. The result was nothing short of transformational, Zammit said.
“We were going to focus on the war period, but given how successful the software is for us, we might go back to 1870,” he said. “It’s blossoming, hopefully, into multiple papers.”
Maharaj, now a research assistant on Zammit's project, had worked with Amazon’s customer service team before moving to Amazon Web Services (AWS), the cloud computing arm of the company. It was there he saw an opportunity to apply AWS technology to Zammit’s data conundrum.
The solution was Textract, an AWS tool used by organizations, such as insurance companies, to automate and standardize collection of data from forms and other documents.
Maharaj looked to apply Textract’s machine learning abilities to the information contained in the trade volume tables. He connected with the Cloud Innovation Centre at the University of British Columbia, which was working to refine Textract’s data collection capabilities, to test the technology with the trade volume information.
What used to take Zammit three years of tedious data entry can now be accomplished in four months. “We can scan 500 documents in less than 45 minutes,” Maharaj said.
With just two clicks, the research team can now quickly and easily upload the trade volume PDFs and convert the information for use in an Excel workbook. The process also gives the researchers the ability to easily filter the results.
“Now the data is organized in the exact format that we need it to be,” Maharaj said. “The cloud has enabled us to put this project on steroids.”
Zammit noted that the project has also created research opportunities for students to participate and gain valuable experience working with economic data.
The project's growing team of 175 volunteer research assistants manage quality assurance by conducting comparative spot checks. Maharaj estimates Textract’s accuracy rate to be between 93 and 95 per cent.
For his part, Maharaj has been able to parlay skills learned on the project into an internship with RBC.
“Making training plans, looking at Excel data and macros – it’s the same thing – process automation,” he said. “These skills are very applicable to the workforce, and we’re giving students those skills as well.”
Zammit’s current research, which focuses on trade diversion and loss in the British dominions during the First World War, draws on primary sources like the Canada trade volumes. The digitized federal government documents span nearly 100 years, from 1870 onwards.
The economic historian hopes to shed new light on how costly trade diversions or sanctions can be for countries engaged in war.
Zammit says the digital tool has reduced the cost of collecting information and increased the volume of available data, offering the researchers new opportunities to compare with modern economic phenomenon.
The researchers hope to release preliminary results from their analysis early next year.