Babies learn about their world by pushing and poking objects, putting them in their mouths, and throwing them. Carnegie Mellon University (CMU) scientists are taking a similar approach to teach robots how to recognize and grasp objects around them.

Manipulation remains difficult for robots, becoming an application bottleneck. But researchers at CMU’s Robotics Institute have shown that by allowing robots to spend hundreds of hours poking, grabbing, and otherwise physically interacting with a variety of objects, those robots can teach themselves how to pick up objects.

In findings presented at the European Conference on Computer Vision, they showed that robots gained a deeper visual understanding of objects when they were able to manipulate them. Researchers, led by Assistant Professor of robotics Abhinav Gupta, are scaling up this approach with help from a three-year, $1.5 million Focused Research Award from Google.

“We will use dozens of different robots, including one- and two-armed robots and even drones, to learn about the world,” Gupta says.

He adds the shortcomings to previous robotic manipulation approaches were apparent during the Defense Advanced Research Projects Agency’s (DARPA) Robotics Challenge in 2015. Advanced robots, designed to respond to natural or manmade emergencies, had difficulty opening doors or unplugging electrical cables.

“Our robots still cannot understand what they see and their action and manipulation capabilities pale in comparison to those of a two-year-old,” Gupta says.

For decades, visual perception and robotic control have been studied separately. Visual perception developed with little consideration of physical interaction, and most manipulation and planning frameworks can’t cope with perception failures. Gupta predicts allowing the robot to explore perception and action simultaneously, like a baby, can help overcome these failures.

“Psychological studies have shown that if people can’t affect what they see, their visual understanding of that scene is limited,” says Lerrel Pinto, a Ph.D. student in robotics in Gupta’s research group. “Interaction with the real world exposes a lot of visual dynamics.”

Robots are slow learners, requiring hundreds of hours of interaction to learn how to pick up objects. Because robots have been expensive and often unreliable, researchers relying on data-driven approaches have suffered from too little information.

Pinto says much of the work by the CMU group has been done using a two-armed Rethink Robotics Baxter robot with a simple, two-fingered manipulator. Using multiple robots with more sophisticated hands will enrich manipulation databases.

The success of this research has inspired other research groups to adopt this approach and help expand databases.

“If you can get the data faster, you can try more things – different software frameworks, different algorithms,” Pinto says.

And once one robot learns something, it can be shared with all robots.

Carnegie Mellon University

Rethink Robotics