When you read this, you’ll never think about autonomous vehicles the same way again!
I want to talk about some discussions I had with Daniela Rus, the director of our own CSAIL lab at MIT.
Highlighting some pioneering work on autonomous driving as we brainstorm, he talks about what people are already doing (with Dall-E etc.) creating images from text.
“What else can we use text to create?” he asks, specifically mentioning text-to-drive, which is a major focus at CSAIL right now. And I’m proud of what they do!
Recently, Rus talked about how to reach these kinds of goals.
He suggests that we can work on an existing “recipe” for autonomous driving that has been developed in many countries by various companies. It combines a drive-by-wire system with sensors, software for perception and an estimation model that can improve systems that he said are often “quite fragile” in their ability to reason.
Machine learning, he adds, can learn from humans in new ways, at least from scratch.
Here are some of her quotes on the subject:
“We have solutions to solve these problems. These solutions need data.”
“We have to do a lot more… We have to think about the physics, we have to think about the geometry – we have to think about the rules of the road – it all comes together in this project.”
“They require us to specifically plan for each type of road situation we encounter,” (systems representing self-driving 1.0) (my term). “So you have to watch your parameters for night driving, daytime driving, driving in the rain, driving in the sun, driving with or without lane markers… that’s tough.”
“In simulation, we can turn that data into anything we want,” he said. “With this approach, we can train self-driving vehicles to handle challenging driving situations.”
An important part of improving self-driving, he explains, involves how to help robotic systems respond to black swan crisis conditions, and that can be expensive and complex.
What can we do?
Rus describes systems based on flexibility of data inputs and applied logic. It addresses the concept of end-to-end learning and transferable reinforcement learning in the real world with a triangle approach that combines these three elements: data-model-deploy. You can see much more in her slide deck or in the presentation itself. It is imperative! Here are a few more:
“Can we make our robots understand their environment?” he asked and then answered. “We can start to make progress in that direction.”
One goal, he noted, is to “connect knowledge of the world with visual learning systems” for spatial reasoning.
“The end result is that we end up with systems that allow us to reason about input streams using language,” he said, “generalizing to new situations.”
In explaining how this works, Rus mentioned a set of descriptors that you curate to drive this engine – in other words, you can move from a simple model of perception to action based on visual pixels, to something where the vehicle starts to understand how to you talk about objects like deer, road and sky.
Think about it – instead of just “seeing” the road and responding in statistical ways, programs might be reading textual descriptions of how to handle things, learning and responding accordingly. In this way, it is easy to imagine how an AI will become a much better driver than the average human!
“This really accelerates the way our autonomous systems can understand their world,” he said. “We’re getting them to a point where it’s much easier to interact with them … all of this will lead us to a world where we can start to think of our cars as our friends.”
I was going to say “that’s it”, but really, what an idea! Many of us had significant reservations about the limits of self-driving, and the news about Tesla’s Autopilot didn’t help. But this talk really helped me understand how to overcome some of these problems in a fundamental way. When AI learns to reason from written information, it will be very capable and we need to think of ways in which it will surpass humans. This presentation, for me, was one of the highlights of our conference.