To speed discoveries, U of T lab launches free library of virtual, AI-calculated organic compounds

U of T's Alán Aspuru-Guzik

The lab of U of T's Alán Aspuru-Guzik, in collaboration with partners in academia and industry, has launched an open-access library of about 300,000 virtual, machine-learning calculated organic compounds (photo by Johnny Guatto)

Alán Aspuru-Guzik’s research group at the University of Toronto has launched an open-access tool that promises to accelerate the discovery of new chemical reactions that underpin the development of everything from smartphones to life-saving drugs.

The free tool, called Kraken, is a library of virtual, machine-learning calculated organic compounds – roughly 300,000 of them, with 190 descriptors each.

It was created through a collaboration between Aspuru-Guzik’s Matter Lab, the Sigman Research Group at the University of Utah, Technische Universität Berlin, Karlsruhe Institute of Technology, Vector Institute for Artificial Intelligence, the Center for Computer Assisted Synthesis at the University of Notre Dame, IBM Research and AstraZeneca

“The world has no time for science as usual,” says Aspuru-Guzik, a professor in U of T’s departments of chemistry and computer science in the Faculty of Arts & Science. “Neither for science done in a silo.

“This is a collaborative effort to accelerate catalysis science that involves a very exciting team from academia and industry."

When developing a transition-metal catalyzed chemical reaction, a chemist must find a suitable combination of metal and ligand. Despite the innovations in computer-optimized ligand design led by the Sigman group, ligands would typically be identified by trial and error in the lab. With Kraken, however, chemists will eventually have a vast data-rich collection at their fingertips, reducing the number of trials necessary to achieve optimal results.

“It takes a long time, a lot of money, and a whole lot of human resources to discover, develop and understand new catalysts and chemical reactions.” says co-lead author and Banting Postdoctoral Fellow Gabriel dos Passos Gomes. “These are some of the tools that allow molecular scientists to precisely develop materials and drugs, from the plastics in your smartphone to the probes that allowed for humanity to achieve the COVID-19 vaccines at an unforeseen pace.

“This work shows how machine learning can change the field.”

The Kraken library features organophosphorus ligands, what Tobias Gensch – one of the co-lead authors of this work – described as “some of the most prevalent ligands in homogeneous catalysis.”

“We worked extremely hard to make this not only open and available to the community, but as convenient and easy to use as we possibly could,” says Gomes, who worked with computer science graduate student Théophile Gaudin in the development of the web application. “With that in mind, we created a web app where users can search for ligands and their properties in a straightforward manner.”

While 330,000 compounds will be available at launch, the team plans to create a much larger library of more than 190 million ligands. In comparison, similar libraries have been limited to compounds in the hundreds – with far fewer properties.

“This is very exciting as it shows the potential of AI for scientific research,” says Aspuru-Guzik. “In this context, the University of Toronto has launched a global initiative called the Acceleration Consortium which hopes to bring academia, government, and industry together to tackle AI-driven materials discovery.

“It is exciting to have Professor Matthew Sigman on board with the consortium and seeing results of this collaborative work come to fruition.”

In January 2022, Gomes will take on a new role as assistant professor in the departments of chemistry and chemical engineering at Carnegie Mellon University, where he aims to pioneer research on the design of catalysts and reaction discovery.

Kraken can be freely accessed online and the preprint describing how the dataset was elaborated and how the tool can be used for reaction optimization can be accessed at ChemRxiv.


The Bulletin Brief logo

Subscribe to The Bulletin Brief

Arts & Science