Monday, December 23, 2024

Google’s Gemini-Powered Robot Wants To Memorize Your Favorite Soda

Must read

There have been a lot of beneficiaries of the AI boom — phones, cars, people who are too exhausted to read emails — but none as surprising as the world of robotics.

With the help of large language models like ChatGPT, companies like Figure are imbuing their bots with personality or supercharging their ability to understand multi-step commands. From Nvidia to Tesla’s Optimus, there’s clearly a robot renaissance happening and chatbots are firmly at the middle of it.

Google is hopping in on the trend and throwing its own superpowered AI, Gemini, into a bot that can reliably navigate environments, understand more complex tasks, and even has a sense of “memory.” The results aren’t exactly a readymade butler — but you can see we’re not far at all from a future with actually helpful robot helpers.

Hey Google, Show Me a Cool Robot

In a recent demonstration and paper from Google’s Deepmind, researchers showed how Gemini 1.5 can essentially extend a robot’s context, allowing it to map important areas and navigate to them in the future.

To train the robot, Deepmind researchers bring the robot on a tour, pointing out key areas and objects along the way. Afterward, the robot chaperone watches the video frames resulting from those tours to further concrete the environment. Thus, a kind of mental map is formed.

While the ability to remember environments is interesting, it’s what the robot can actually do with that memory that really jumps out. According to Google, after being trained on the tours, its Gemini-powered robot can remember things like where the temporary desks are or important stuff like where a power outlet is.

Google researchers say that, in the future, the Gemini robot could get even more specific. For example, someone in an office with lots of empty Coke cans on their desk could ask the robot if the kitchen has their favorite drink and — using the can context — understand what their drink of choice is and then go check. If that’s not a step toward robot butlers, I don’t know what is.

What Large Language Models Have To Do With Robotics

It’s still early days for this kind of confluence of AI and robotics, but the results are at least initially promising. Large language models like Gemini and ChatGPT are a long way off from being sentient, but their ability to understand natural language and process multi-step commands is above and beyond anything we’ve seen before.

If we are to live in a future where robot butlers are chaperoning us to our desks, doing our laundry, or helping us care for the elderly, a big first step is understanding environments and all of the context in them.

Sure, Google’s demo might not seem revolutionary on the surface, but it’s yet another example — experimental or not — of how AI and humanoid and non-humanoid bots are on a clear collision course. And given how rapidly AI seems to advance, I’m going to guess that Deepmind will be back in the not-so-distant future with even more sophisticated research.

And if all that means I don’t have to fetch my own beer from the fridge, then I’d say the future of AI robots looks pretty bright.

Latest article