Monday, December 23, 2024

Google’s AI focus isn’t on making a phone assistant – it’s pursuing the real world

Must read

To date, the most widely used assistants have let you interact with phones via voice commands. The utility of being able to use your mobile device hands-free is obvious. 


9to5Google has a rebooted newsletter that highlights the biggest Google stories with added commentary and other tidbits. Sign up here!


For Google, its peak investment in this area was the new Google Assistant in 2019 that debuted on the Pixel 4. Google’s goal was to “let you instantly operate your phone with your voice, multitask across apps, and complete complex actions, all with nearly zero latency.” 

With on-device voice processing driving this assistant, the company boldly proclaimed that “tapping to operate your phone would almost seem slow.”

Unfortunately, the experience — which does still exist on Pixel phones today — required users to stick to particular phrasing rather than letting them speak naturally and automatically discern the intention. Meanwhile, the possible actions were very limited and did not work with many apps. 

Siri on iOS 18 with Apple Intelligence is taking another stab at this, with large language models (LLMs) possibly being the key to a voice assistant that can use any app on your phone. 

Google has been researching the same thing, and could very well still create one. However, after I/O 2024, I no longer think this is the company’s priority.

Rather, Google ultimately wants to build an AI assistant that helps you in the physical realm. The idea being that most of your questions and problems are still happening in the real world without a digital equivalent.

Core to this is being able to point your phone (or smart glasses in the future) at something and asking for more information or help. 

This is what Google showcased with Project Astra, which will be coming to the interactive Gemini Live experience that lets you have a natural two-way conversation. Going Live with Gemini is expected to arrive this year, with the voice aspect launching first and then the camera capabilities.

Meanwhile, so much of your information is stored as pictures and videos. The Gemini-powered Ask Photos that turns your library into a corpus of your real-world knowledge that Google can use to help you.  

Taking a picture of information in the real world and having Google organize it is genuinely a time saver and inherently assistive in nature. One of my favorite examples of this from I/O is something that’s not particularly flashy. Google Calendar, Keep, and Tasks Gemini Extensions are coming:

…you’ll be able to do things like snap a picture of your child’s school syllabus and ask Gemini to create a calendar entry for each assignment, or take a photo of a new recipe and add it to your Keep as a shopping list.

Gemini Advanced is also getting an immersive travel planner, while the example of an agentive experience that Google chose started by taking a picture of a shoe you bought to start the return process. The other was tasking Gemini with helping you move to a new city and making all the necessary changes.

Something that can help you navigate the world certainly seems more flashy and impressive than an assistant that can traverse your phone, which Google could very well be working on. Android users might prefer a revamped phone assistant and it remains to be seen how useful something like Astra is, but you can’t accuse Google of not swinging for the stars.

The stated goal of Project Astra is to build a “universal AI agent that can be truly helpful in everyday life.” Camera input that provides a live view to the world genuinely solves a problem that has originated since Google’s origin: some questions and their real-world contexts cannot easily be put into words for a text query. 

FTC: We use income earning auto affiliate links. More.

Latest article