As modern society faces a growing dependence on artificial intelligence and robotics, engineers at the Massachusetts Institute of Technology (MIT) have developed a novel system that significantly boosts the ability of robots to carry out specific tasks. The system, referred to as Clio, facilitates quicker scene mapping for robots while also helping with identification of the pertinent items required for task completion.
Clio allows robots to make task-appropriate decisions in much the same way as humans. For instance, when cleaning a cluttered workspace, humans focus on items related to the task at hand. Clio employs the same concept but within the realm of robotics. The AI tool ingests a list of tasks given in natural language and on that basis, discerns the degree of detail needed to comprehend its environment. It "remembers" only task-relevant aspects of the scene.
In a series of real-world experiments including cluttered cubicles and a multi-storied building, Clio perfectly demonstrated its capabilities. This groundbreaking algorithm was also run in real-time on a quadruped robot. As the robot explored an office building, it was able to identify and map only elements of the scene pertinent to its tasks, thereby exhibiting highly efficient object recognition.
Clio, aptly named after the Greek muse of history known for remembering only essential elements, appears promising for any situation where a robot needs to swiftly survey its surrounding and understand it in the context of its specific task. It is deemed to be particularly beneficial for tasks like search and rescue but can also be utilized in household chores or factory floor operations.
Despite major advances in computer vision and natural language processing enabling robots to recognize objects in their surroundings, real-world application remained a challenge. Traditionally, robots were limited to "closed-set" scenarios where they performed in a specifically curated and controlled environment. The introduction of Clio marks a departure from this closed-set toward a more "open approach" allowing robots to function in more realistic situations.
The method combines computer vision techniques, large language models, and a classic information theory concept known as the "information bottleneck". This innovative approach allows a robot to derive an understanding of its environment that can be automatically adjusted according to the tasks it has been assigned. Whether it is about moving a pile of books or fetching a specific one from the pile, robots can now identify with the correct level of granularity, what objects are integral to accomplish their purposes.
Clio has paved the way for more exhaustive use of robots in diverse fields. The team is now working towards enabling Clio to deal with higher-level tasks and aiming to provide a more human-like understanding of how to accomplish complex tasks.
Disclaimer: The above article was written with the assistance of AI. The original sources can be found on ScienceDaily.