MIT engineers aim to enhance the decision-making capabilities of household robots by infusing them with a touch of 'common sense'. Their new teaching method, aiding robots to self-correct following any missteps, links robot motion data with the large language models' knowledge, also known as LLMs.
Training home-bots to perform increasingly complex household tasks like cleaning spills to serving food, the team programmed these machines to mimic human actions. Despite being excellent mimics, robots falter when faced with disruptions unaccounted for in the initial programming, having no choice but to restart the task.
The engineers at MIT hope to resolve this by their unique approach, enabling a robot to parse tasks into subtasks logically, and make adjustments within a subtask to dodge the need to restart.
Despite imitation learning being an effective path for training household robots, Yanwei Wang, an MIT EECS graduate student, suggests it is flawed as tiny mistakes can accumulate. But with their approach, a robot can correct execution errors, thereby improving overall task success. This new approach was shared in their study at the International Conference on Learning Representations (ICLR) scheduled for May.
The team illustrates their method using a simple task – scooping marbles from one bowl into another. Engineers would generally guide a robot through the complete motion trajectory for such tasks. But, the MIT team realized this process actually includes a series of subtasks; the robot reaches into a bowl, scoops up marbles, moves towards the empty bowl, and then pours the marbles in. The 'common sense' comes in when the robot self-corrects in real-time when it falters during a subtask.
LLMs contribute significantly to the process. They store extensive text libraries and determine connections between words, sentences, and paragraphs. These connections enable the LLM to generate new sentences in line with what it has learned. The researchers found that LLMs could logically list out the tasks involved in a single chore, like reach, scoop, transport, and pour.
The researchers combined LLM's processed language with the robot's physical motion, enabling the robot to know its position in a task and replan and recover independently. They developed an algorithm to connect the LLM's subtask label and a robot's physical position or an image. This is called 'grounding'. The algorithm learns to identify the subtask a robot is in, given its physical coordinates or image view.
The team demonstrated this method in experiments with a robotic arm trained on a marble scooping task. They observed that the bot was able to reroute its actions independently when disrupted during a subtask. With this new method, household robots can execute complex tasks more robustly, even when faced with unforeseen deviations.
Disclaimer: The above article was written with the assistance of AI. The original sources can be found on ScienceDaily.