Google Assistant was supposed to be Google’s future. The voice assistant was an early attempt at ambient computing that gave customers a one-on-one relationship with all of the company’s products. Sure, Google was directly responding to Amazon’s announcement of Alexa and the Echo, but it was also deploying years of research it had already conducted on natural language processing and machine learning. In many ways, the Assistant was uniquely suited to supercharge a voice-driven Google Search.
And yet, eight years since Google Assistant and the Google Home smart speaker were first announced at I/O 2016, voice assistants were nowhere to be found at I/O 2024, and the category has few improvements to show for its near decade of existence. Gemini, the family of multimodal AI models integrated across Google’s products, is now the star of the show, and generative artificial intelligence is the bridge Google has to sell, if only to keep up with the clear lead OpenAI has with ChatGPT.
For a brief time, Google was on top, successfully making the case that the Assistant could be the default way people interacted with its services, and maybe even the entire web. But whether because of the limits of voice assistant technology or its wavering interest, Google is starting over again, and it’s not clear large language models have any more of a future.
Conversational Assistant
Back in 2016, Google CEO Sundar Pichai positioned Google Assistant as a response, first and foremost, to an increasing number of voice searches that were happening over mobile. Searches are easy to initiate from the home screen of a majority of Android phones, and Google had offered voice search for years at that point, so it’s not particularly surprising that they could be popular. In the context of Amazon Alexa, though, it’s easy to see why Google felt a certain amount of urgency.
Google Assistant is designed to be conversational, providing an ongoing, two-way dialogue between users and Google. Pichai said during a keynote that the goal was “building each user their own personal Google” powered by “the world’s best natural language processing.” In practice, that looked like combining the awareness of Google Now, Google’s previous AI-powered proactive assistant feature, that could pull in details from your calendar, email, and location to remind you of things you might forget at the correct time with real-time information and actions. Google Assistant can answer questions by leveraging Google Search, control media playing on mobile or smart home devices, and over time, gained access to a majority of Google’s services, giving it the ability to create calendar events, send messages, and add reminders.
It’s nothing revolutionary, but when it works right, using Google Assistant is far easier than whipping out your phone or sitting down at your computer to do the same thing. And Google made ambitious additions to Google Assistant’s capabilities over the years, including Duplex, its method for enabling Google Assistant to set appointments and make restaurant reservations on your behalf. Essentially, Google combined the language processing of Assistant with what amounts to robocalling software to contact physical businesses and do the talking for you. Duplex can feel a bit impersonal and rude to employees, but it does work as described.
Google Assistant was quick and could “understand” commands better than Alexa or Apple’s Siri…
Google’s first vessel for Google Assistant was the Google Home smart speaker. It looked a bit like an air freshener, but with interchangeable bases that could be swapped to match the aesthetics of your room and a touch surface at the top that could light up when it was listening to you. Funny-looking product aside, the smart speaker’s Google Assistant was quick and could “understand” commands better than Alexa or Apple’s Siri, even if you were in another room. And its integration with the Cast protocol Google introduced with the Chromecast made it simple to start playing music or videos on your speakers or TV with just your voice.
Google Home is one of a few Google products that is both straightforward and works well, and that’s remained true as Google has released new models (even some with screens) and revamped the device under the Nest brand.
Voice Assistant Problems
The problem is, other than Duplex, Google Assistant hasn’t really advanced in the eight years it’s been around, particularly in the voice-only interactions available on the Nest Audio, the current version of Google’s speaker that was released, at the time of writing, four years ago in 2020.
Many of the advantages Google Assistant had when it came to recognizing your voice or allowing for more natural conversations with follow-up questions have been reproduced by Amazon for Alexa. The frequently speedier response time for simple tasks has similarly been copied with on-device processing powered by new custom chips on current Echos. The main advantage Google Assistant has is access to a user’s personal information stored in Gmail and other Google services. That might not be a very big differentiator when most of the things people currently use voice assistants for are playing music and setting timers.
Google is a big company, and its ability to launch a new project and immediately lose interest in it is well-documented. But it’s not fair to describe Google Assistant in those terms, because it was regularly a part of new versions of Android, and usually featured in one way or another in every I/O keynote since its introduction. No, the slow petering off of Google Assistant could be better attributed to the technological dead end that is the existing class of AI voice assistants.
The problem is, other than Duplex, Google Assistant hasn’t really advanced in the eight years it’s been around…
The voice assistants available currently like Google Assistant, Siri, and Alexa, are “command-and-control systems,” The New York Times writes. They “can understand a finite list of questions and requests” and anything that falls outside of that won’t be understood. That more constrained call and response is why even conversations with multiple back-and-forths can feel wooden and restrictive. Ultimately, these voice assistants are looking for the right output for your input and nothing else, regardless of how many services are connected or how many “skills” (Alexa plugins) or “actions” (Google parlance for the same on Assistant) they gain.
Generative artificial intelligence, powered by large language models primarily, aims to be far more flexible. A generative AI like Google’s Gemini will respond to just about anything you ask it in something akin to human speech, and it can be interrupted and redirected with new questions and inputs without having to start from scratch. In other words, it’s contextual in the way that Assistant tried to be but never could become. Gemini, at first blush, seems to be a much better option as a default AI assistant or voice assistant when paired with a smart speaker, which is likely why Amazon is already planning on using generative AI in its Echo devices.
Swapping One Unreliable Assistant for Another
So Google Assistant might have stalled out because it could only go so far at being the helpful, personal version of Google that its namesake wanted it to be. Google Gemini is already available on Android devices (and via the Google app on iOS) and this year’s I/O keynote made it clear that Android 15 and the Pixel 9 phones coming out in the fall will leverage Gemini wherever possible for a more natural, “intelligent,” and personal experience.
Is a limited assistant worse than one that can actively mislead you?
The main problem is that no one is really sure if Gemini is any more reliable than Google Assistant ever was. Gemini certainly sounds more intelligent and seems more capable, but can you trust it? Google Assistant refusing to do something came with the confidence that something went wrong and the voice assistant just wasn’t able to work. Gemini’s mistakes aren’t even always recognizable as mistakes. It’ll confidently lie if you let it, and correcting its missteps requires double-checking everything it answers with. Is a limited assistant worse than one that can actively mislead you? I’d argue that, no, it’s not, but if there’s a theme for 2024, it’s that tech’s biggest companies feel differently.