As usual, Google I/O 2024 is an absolute whirlwind of news and announcements. This year, rather than focusing on hardware, Android, or Chrome, Google spent most of this year’s developers’ conference convincing us that its AI features are worth prioritizing. One of those projects is Project Astra, a multimodal AI assistant you can semi-converse with that can simultaneously use the camera to identify objects and people.
I say “semi” because it’s evident after the demo that this part of Gemini is in its infancy. I spent a few brief minutes with Project Astra on the Pixel 8 Pro to see what it works like in real-time. I didn’t have enough time to test it to its full extent or try to trick it, but I got a feel for what the future might feel like as an Android user.
Ask it almost anything
The point of Project Astra is to be like an assistant who also guides you in the real world. It can answer questions about the environment around you by identifying objects, faces, moods, and textiles. It can even help you remember where you last placed something.
There were four different demonstrations to choose from for Project Astra. They included Storyteller mode, which asks Gemini to concoct a story based on various inputs, and Pictionary, essentially a game of guess-the-doodle with the computer. There was also an alliteration mode, where the AI showed off its prowess at finding words with the same starting letter, and Free-Form let you chat back and forth.
The demo I got was a version of Free-Form on the Pixel 8 Pro. Another journalist in my group had requested it outright, so most of our demonstration focused on using the device and this assistant-like mode together.
With the camera pointed at another journalist, the Pixel 8 Pro, and Gemini could identify that the subject was a person—we explicitly told it that the person identified as a man. Then, it correctly identified that he was carrying his phone. In a follow-up question, our group asked about his clothes. It gave a generalized answer that “he seems to be wearing casual clothing.” Then, we asked what he was doing, to which Project Astra answered that it appeared he was putting on a pair of sunglasses (he was) and striking a casual pose.
I took hold of the Pixel 8 Pro for a quick minute. I got Gemini to identify a pot of faux flowers correctly. They were tulips. Gemini noticed they were also colorful. From there, I wasn’t sure what else to prompt it, and then my time was up. I left with more questions than I had going in.
With Google’s AI, it seems like a leap of faith. I can see how identifying a person and their actions could be an accessibility tool to aid someone who is blind or has low vision as they navigate the world around them. But that’s not what this demonstration was about. It was to showcase the capabilities of Project Astra and how we’ll interact with it.
My biggest question is: Will something like Project Astra replace Google Assistant on Android devices? After all, this AI can remember where you put your stuff and pick up on nuance—at least, that’s what the demo conveyed. I couldn’t get an answer from the few Google folks I did ask. But I have a strong inkling that the future of Android will be less about tapping to interact with the phone and more reliant on talking to it.