Imagine a coffee cup sitting on a table. Now, imagine a book partially obscuring the cup. As humans, we still know what the coffee cup is even though we can’t see all of it. But a robot might be confused.
Robots in warehouses and even around our houses struggle to identify and pick up objects if they are too close together, or if a space is cluttered. This is because robots lack what psychologists call “object unity,” or our ability to identify things even when we can’t see all of them.
Researchers at the University of Washington have developed a way to teach robots this skill. The method, called THOR for short, allowed a low-cost robot to identify objects—including a mustard bottle, a Pringles can and a tennis ball—on a cluttered shelf. In a recent paper published in IEEE Transactions on Robotics, the team demonstrated that THOR outperformed current state-of-the-art models.
UW News reached out to senior author Ashis Banerjee, UW associate professor in both the industrial & systems engineering and mechanical engineering departments, for details about how robots identify objects and how THOR works.
How do robots sense their surroundings?
We sense the world around us using vision, sound, smell, taste and touch. Robots sense their surroundings using one or more types of sensors. Robots “see” things using either standard color cameras or more complex stereo or depth cameras. While standard cameras simply record colored and textured images of the surroundings, stereo and depth cameras also provide information on how far away the objects are, just like our eyes do.
On their own, however, the sensors cannot enable the robots to make “sense” of their surroundings. Robots need a visual perception system, similar to the visual cortex of the human brain, to process images and detect where all the objects are, estimate their orientations, identify what the objects might be and parse any text written on them.
Why is it hard for robots to identify objects in cluttered spaces?
There are two main challenges here. First, there are likely a large number of objects of varying shapes and sizes. This makes it difficult for the robot’s perception system to distinguish between the different object types. Second, when several objects are located close to each other, they obstruct the views of other objects. Robots have trouble recognizing objects when they don’t have a full view of the object.
Are there any types of objects that are especially hard to identify in cluttered spaces?
A lot of that depends on what objects are present. For example, it is challenging to recognize smaller objects if there are a variety of sizes present. It is also more challenging to differentiate between objects with similar or identical shapes, such as different kinds of balls, or boxes. Additional challenges occur with soft or squishy objects that can change shape as the robot collects images from different vantage points in the room.
So how does THOR work and why is it better than previous attempts to solve this problem?
THOR is really the brainchild of lead author Ekta Samani, who completed this research as a UW doctoral student. The core of THOR is that it allows the robot to mimic how we as humans know that partially visible objects aren’t broken or entirely new objects.
THOR does this by using the shape of objects in a scene to create a 3D representation of each object. From there it uses topology, an area of mathematics that studies the connectivity between different parts of objects, to assign each object to a “most likely” object class. It does this by comparing its 3D representation to a library of stored representations.
THOR does not rely on training machine learning models with images of cluttered rooms. It just needs images of each of the different objects by themselves. THOR does not require the robot to have specialized and expensive sensors or processors, and it also works well with commodity cameras.
This means that THOR is very easy to build, and is, more importantly, readily useful for completely new spaces with diverse backgrounds, lighting conditions, object arrangements and degree of clutter. It also works better than the existing 3D shape-based recognition methods because its 3D representation of the objects is more detailed, which helps identify the objects in real time.
Ekta U. Samani et al, Persistent Homology Meets Object Unity: Object Recognition in Clutter, IEEE Transactions on Robotics (2023). DOI: 10.1109/TRO.2023.3343994
University of Washington
Q&A: Researcher discusses how newly developed method can help robots identify objects in cluttered spaces (2024, February 7)
retrieved 7 February 2024
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.