'It's like a rolling boil': U of T's Sanja Fidler on Toronto's hot AI scene and where research is headed in 2019

Photo of Sanja Fidler
Sanja Fidler, an assistant professor of computer science at U of T Mississauga, discusses her computer vision research at Elevate Toronto's AI event in 2017 (photo by Chris Sorensen)

There are few technologies in the world hotter than artificial intelligence, or AI – as evidenced by the recent NIPS (Neural Information Processing Systems) academic conference in Montreal that sold out in less than 12 minutes. 

That’s faster than last year’s Burning Man festival

At the same time, the list of big, multinational companies that are setting up AI research labs in Canada – and Toronto in particular – continues to grow. But where is all this research headed, and what do we non-AI experts need to know about it? 

The University of Toronto’s Sanja Fidler, a leading computer vision researcher and the head of NVIDIA’s research lab in Toronto, says progress in self-driving cars and certain health-care applications is moving “pretty fast.”

However, she also notes the problems AI researchers seek to solve are growing ever more complex, necessitating a more co-operative approach.  

“People used to just work in computer vision or image processing, but now there’s a lot interdisciplinary work to connect these things together,” says Fidler, who is an assistant professor at U of T Mississauga’s department of mathematical and computational sciences and a faculty member at the Vector Institute for Artificial Intelligence. 

As for Toronto’s growth as a global hub for AI research and development, Fidler doesn’t see the trend slowing down any time soon.

“I don’t think we’ve hit a plateau – there’s just going to be more and more,” she says. “It’s like a rolling boil.” 

U of T News recently caught up with Fidler to find out more about her work and her thoughts on where AI is headed in 2019.

Where do you see AI research headed over the next 12 months – what are the trends you’re keeping an eye on?

Things move very quickly, so it can be hard to predict. But one of the trends I witnessed over the past year, and that I think will grow even stronger, has to do with the crossover of different fields. People used to just work in computer vision or image processing, but now there’s a lot interdisciplinary work to connect these things together. There’s crossover between vision, IP, graphics and even program synthesis. So now you’re connecting all these very different fields. But they’re all linked with machine learning and deep learning in particular. As a result, I think you’re going to see some nice advances. 

I think we’re also going to see more progress in certain application domains. Self-driving cars and health care are both important applications, and areas where progress is moving pretty fast. People are also going to start looking more into the idea of fairness – meaning the fairness of machine learning models and the issue of training on biased datasets. 

In my domain, people have started looking into embodied agents – so not just an algorithm that only sees images, but one that exploits the fact that it’s an agent that performs actions in the world. Vision is just one of the inputs that helps it function in an environment and make predictions about various tasks. There’s actually been a lot of work in designing simulations where you can train these embodied agents – so I think this will be a new big thing, where people make more sophisticated simulators and train more sophisticated algorithms.

What’s an embodied agent?

It’s basically a robot that goes around and is actually tasked to do something. Typically, the way computer vision works is you have an image and you want to segment different objects because someone has decided this is an important task – for example, this is a car, this is something else and so on. But what we want to do in robotics is design a robot that can actually do something in the environment. So, for example, make coffee, watch TV or something along these lines. But that takes things much further because we might not know the task that will need to be completed. 

So what are the things we need to learn from each sensory input? It could be vision. It could be sound. What is necessary to enable these robots to perform these tasks? It links all these fields and adds control – how to move in this environment. A lot of computer vision people have started working in simulated environments where you use vision as a sort of auxiliary task needed to solve a more complex problem. This is key. As humans, we all get different sensory inputs and different modalities can help each other. For example, a motorbike’s sound might be more distinct and easier to recognize than the shape of a motorbike. The interactions between different modalities and fields can stitch together nicely. I think a lot of people are exploring this. 

When it comes to your own work, what are you most excited about? 

I’m actually looking into these embodied agents – including in a simulated environment for household activities. That’s one of our bigger pushes. Most of our current simulators had navigation tasks: Can you have robot walk from a room to a TV or some other location in the environment? But what we want to do is really teach robots to do more high-level activities. So, for example, making coffee, cooking, throwing a party – something that requires more high-level planning. We’ve put quite a few resources into this. It’s been our project for the past two years. We designed this simulator for crowdsourcing information about tasks that people do in the home, and then converted that into robot language – language that a robot could understand. We then make that into a learning platform to teach robots how to do that.

How close are we getting to the point where some of these AI technologies are used outside the lab?

Some applications are certainly using the technology already. For example, there are already commercial systems for music recognition that can do it much better than humans. On your phone, Google recently released a tool that blurs the background of a photo – that’s all based around deep learning algorithms. So some applications are already there. But if you look at more complicated applications like self-driving cars, drones or general robotics – that’s still further down the road. In general, these are difficult problems that will take some time to solve.

It’s been about six months since you joined the NVIDIA research lab in Toronto. How has that impacted your research?

I really love it there. NVIDIA is a leader in GPU [graphics processing unit] technology that’s powering all the current deep-learning efforts. But it’s really invested in the importance of AI and research. I don’t know if you saw, but [NVIDIA co-founder and CEO] Jensen Huang actually came to Toronto to open the lab.

I listened to his talk. 

He’s a really smart guy. With his understanding of research and the importance of AI, it’s very nice for us to work there. The research is very open. They allow me to work on a lot of different things – and everything can become important internally because there’s so many diverse interests inside of NVIDIA and so many different application domains. NVIDIA has great technology for graphics, self-driving, robotics – to name just a few examples. This kind of diverse interest really suits me because I like to work on different things. It’s also great to know that, whatever we work on, it’s potentially important down the line for products. It’s very fast-paced, which I love as well. At the same time, speaking as an academic, I like having more clarity on problems. Having that clarity has made me a better researcher, I think.

Are there any particular problems you’ve encountered at NVIDIA that you hadn’t considered before, or that have opened up new avenues for your research?

The things that I’m doing which are a little bit different, and where I think I can make a big impact inside the company, has more to do with content creation. For example: scaling up simulations that might be used for graphics or gaming. There’s a lot of internal teams working on stuff like that, but I want to contribute with some AI tools. This is really not something I would be able to do at a university because I wouldn’t have the resources. So this is something I wasn’t expecting, but that I found really interesting and wanted to work on. It's very open. I just adapt my research to what’s going on. 

What’s your sense of how the AI research ecosystem is developing in Toronto? It seems like every few months there’s an announcement about a company opening a lab here.

I think it’s so exciting. Finally, the world is seeing the incredible potential we have here in Toronto. U of T is basically a pioneer in deep learning – we have Geoff Hinton and there’s so many renowned faculty working on various AI fields, particularly machine learning and deep learning. Now, with the Vector Institute, there’s so many students who are working on this field, too, so I think the local talent and expertise is incredible. There’s always seminars trying to connect industry with academia, incubators like CDL [Creative Destruction Lab] and the Department of Computer Science Innovation Lab [DCSIL] that can help you if want to take that technology and turn it into a startup. There’s a lot of support for that.
I think the big companies see that talent and want to be part of it. I really think Toronto is going to be the next Silicon Valley.

Do you see that trend continuing for the foreseeable future?

Yes. I’ve seen the hiring at NVIDIA as well – so I’ve seen both sides. I think the interest is increasing. I don’t think we’ve hit a plateau – there’s just going to be more and more.  It’s like a rolling boil. Some companies open labs and then more companies want to come. And then when all that takes off, more talent is going want to come to Canada to work here because there’s so many different opportunities. 

What do you wish people better understood about what you and your colleagues do?

I think everyone now realizes the potential of AI. You can see that with the media attention – it’s just increasing. Conference attendances are going crazy. So everyone understands the importance. But I still think maybe its still a bit over-hyped in the sense that many people think deep learning is a solution to all their problems. But to get that final performance to really solve a problem – the technology is not there yet. There’s so many different things that one needs to solve: the data collection, the robotic platforms, if you’re doing something like that. Most of it is just so complex. Thinking it’s all going to be solved overnight is just hype. People need to realize the technology isn’t ready yet and won’t be ready tomorrow. There’s definitely amazing potential, but it’s going to take time.