Tracking proteins using AI: U of T scientists develop deep learning algorithm

photo of protein
Yeast cells (purple) with DNA-containing nuclei (pink) and protein (green) residing in a cell's waste compartment

U of T researchers have developed a deep learning algorithm that can track proteins to help reveal what makes cells healthy and what goes wrong in disease.

From self-driving cars to computers that can diagnose cancer, artificial intelligence (AI) is shaping the world in ways that are hard to predict, but for cell biologists, the change could not have come soon enough. With new and fully automated microscopes, scientists collect reams of data faster than they can analyze it.  

The protein-tracking algorithm, Dubbed DeepLoc, can recognize patterns in the cell made by proteins better and much faster than the human eye or previous computer vision-based approaches.

“We can learn so much by looking at images of cells. How does the protein look under normal conditions? Do they look different in cells that carry genetic mutations or when we expose cells to drugs or other chemical reagents?” says Benjamin Grys, a graduate student in molecular genetics who recently co-authored a paper on the research. “People have tried to manually assess what’s going on with their data, but that takes a lot of time.”

In the cover story of the latest issue of Molecular Systems Biology, teams led by the Donnelly Centre's Brenda Andrews and Charles Boone, both professors of molecular genetics, also describe DeepLoc’s ability to process images from other labs, illustrating its potential for wider use.

“Right now, it only takes days to weeks to acquire images of cells and months to years to analyze them. Deep learning will ultimately bring the timescale of this analysis down to the same timescale as the experiments,” says Oren Kraus, a lead co-author on the paper and a graduate student co-supervised by Andrews and the Donnelly Centre's Brendan Frey, professor of electrical and computer engineering. 

Learn more about Frey's startup which uses AI to develop drugs for genetic conditions

Kraus is now working with Jimmy Ba, a graduate student of AI pioneer Geoffrey Hinton, a U of T University Professor Emeritus in computer science who is the chief scientific adviser of the newly established Vector Institute, to commercialize the method through a new startup. The goal of the startup named, Phenomic AI, is to analyse cell image-based data for pharmaceutical companies.

“In an image based drug screen, you can actually figure out how the drugs are affecting different cells based on how they look rather than some simplified parameters such as live/dead or cell size,” says Kraus. “This way you can extract a lot more information about cell state form these screens. We hope to make the early drug discovery process all the more accurate by finding more subtle effects of chemical compounds.”

Similar to other types of AI, in which computers learn to recognize patterns in data, DeepLoc was trained to recognize diverse shapes made by glowing proteins – labelled a fluorescent tag that makes them visible – in cells. But unlike computer vision that requires detailed instructions, DeepLoc learns directly from image pixel data, making it more accurate and faster.

Grys and Kraus trained DeepLoc on previously published data that shows an area in the cell occupied by more than 4,000 yeast proteins – three-quarters of all proteins in yeast. This dataset remains the most complete map showing exact position for a vast majority of proteins in any cell. When it was first released in 2015, the analysis was done with a complex computer vision and machine learning pipeline that took months to complete. DeepLoc crunched the data in a matter of hours.

DeepLoc was able to spot subtle differences between similar images. The initial analysis identified 15 different classes of proteins, each representing distinct neighbourhoods in the cell. DeepLoc identified 22 classes. It was also able to sort cells whose shape changed due to a hormone treatment, a task that the previous pipeline couldn’t complete.

Grys and Kraus were able to quickly retrain DeepLoc with images that differed from the original training set, showing that it can be used to process data from other labs. They hope that others in the field – where looking at images by eye is still the norm – will adopt their method.

“Someone with some coding experience could implement our method,” says Grys. “All they would have to do is feed in the image-training set that we’ve provided and supplement this with their own data. It takes only an hour or less to retrain DeepLoc and then begin your analysis.”

Learn more about entrepreneurship and startups at U of T

Donnelly