Saturday, November 23, 2024

What Gemini and Google AI features we’re waiting for

Must read

In the past year or so, Google has previewed a number of Gemini-branded and other AI features across its consumer-facing apps. Here’s everything that’s been announced and when they might be available. 

Pixel

At the end of Made by Google 2023, a Zoom Enhance that “intelligently fills in the gaps between pixels and predicts fine details” was teased for the Pixel 8 Pro. Leveraging an on-device “custom generative AI image model,” Google pitched this as being useful when you forget to zoom. 

It’s an incredible application of generative AI, opening up a range of possibilities for framing and editing your images. So the kind of zoom enhancement you used to see in science fiction— it’s right in the phone in your hand. 

In October, Google said this was “coming later.” After three Pixel Feature Drops, it has yet to arrive. It’s not clear if the model Google is referring to is Gemini Nano with multimodality. At this point, it might as well debut with the Pixel 9 Pro as that phone’s headlining photo feature. 

Google Home

In the Google Home app, generative AI will be used to summarize events into a “streamlined view of what happened recently.” This “quick and easy summary” will make use of bullet points, while you’ll also be able to conversationally “Ask about your home” to find video history clips and get automations. The “experimental features” are coming to Nest Aware subscribers in 2024.

Fitbit

Fitbit Labs will let Fitbit Premium users test and provide feedback on experimental AI capabilities. 

One such feature is a chatbot that lets you ask questions about your Fitbit data in a natural and conversational manner. This “personalized coaching” that takes into account fitness goals aims to generate “actionable messages and guidance,” with responses that can include custom charts. 

  • “For example, you could dig deeper into how many active zone minutes (AZMs) you get and the correlation with how restorative your sleep is.”
  • “…this model may be able to analyze variations in your sleep patterns and sleep quality, and then suggest recommendations on how you might change the intensity of your workout based on those insights.”

Behind-the-scenes, this is powered by a new Personal Health LLM from Fitbit and Google Research built on Gemini. 

As of March, it’s coming “later this year” for a “limited number of Android users who are enrolled in the Fitbit Labs program in the Fitbit mobile app.” 

Google Photos

Ask Photos will let you ask questions about the images and videos in your library. Beyond finding pictures, it can draw out information and give you a text answer. Powered by Gemini, example queries include “Show me the best photo from each national park I’ve visited” and “What themes have we had for Lena’s birthday parties?” It can be used to “suggest top pictures” and create captions for them. Ask Photos is an “experimental feature that we’re starting to roll out soon,” with Google already teasing more capabilities in the future.

Gmail + Google Workspace

In Gmail for Android and iOS, you’ll find a Gemini button in the top-right corner that lets you bring up the mobile equivalent of a side panel to enter full prompts. Gmail is also getting Contextual Smart Replies that offer more customized, detailed, and nuanced suggestions. This will be rolling out to Workspace Labs in July.

At Cloud Next 2024 in April, Google also previewed a voice prompting capability for Help me write in mobile Gmail. Meanwhile, an “instant polish” feature will “convert rough notes to a complete email with one-click.”

On desktop web, the side panel is available in Gmail, Google Drive, and Docs/Sheets/Slide. Gemini is next coming to Google Chat to summarize conversations and answer questions.

Google Maps

Back in February, Google announced that Maps would be using LLMs to power an “Ask about” chatbot. You can use it to find places that match your prompt with support for follow-up questions. It’s powered by details about 250 million places and user-submitted photos, videos, and reviews. 

Chrome

Gemini Nano is coming to desktop Chrome to power browser features like “Help me write.” It should be available on most modern laptops and desktops.

Besides launching AI Overviews, Google previewed a number of upcoming features that are first coming to Search Labs:

  • You will be able to take the Original AI Overview and make it “Simpler” (to just a few sentences) or “Break it down” (longer response).
  • Multi-step reasoning capabilities will let you ask a complex question in one go rather than breaking it up into multiple queries.
  • Meal and trip planning
  • AI-organized search results page
  • Video searches: Record a video and ask a question about it

Android 

Gemini Nano with Multimodality will launch on Pixel “later this year” and power features like on-device/offline TalkBack descriptions, and real-time scam alerts that listen to a call for telltale patterns. Google will share more details later this year. 

At I/O 2024, Google also previewed how Gemini on Android will soon be an overlay panel instead of opening a fullscreen UI to display results. Besides preserving context, this will let you drag-and-drop a generated image into a conversation. For Gemini Advanced subscribers, “Ask this video” and “Ask this PDF” buttons will see Gemini digest videos and documents, respectively. This is rolling out “over the next few months.” Furthermore, Dynamic Suggestions will use Gemini Nano with Multimodality to understand what’s on your screen:

For example, if you activate Gemini in a conversation talking about pickleball, suggestions might include “Find pickleball clubs near me” and “Pickleball rules for beginners.”

Another addition that will be particularly useful on mobile are Gemini Extensions for Google Calendar, Tasks, and Keep. This will let you take a picture of a page with multiple upcoming dates that Gemini will be able to turn into Calendar events. In the coming months, a “Utilities” will let mobile Gemini access Android’s Clock app.

We’re also waiting for mobile Gemini to arrive on the Pixel Tablet this summer.

Gemini 

Live will let you have a two-way conversation with Gemini. To make the experience more natural, Gemini will return concise responses that you can interrupt to add new information or ask for clarification. You can choose from 10 different voices, with Google imagining Gemini Live as being helpful for interview prep or releasing a speech. It will be available in the “coming months” for Gemini Advanced members.

“Later this year,” Gemini Live will let you launch a live camera mode. Just point at something in the real world and ask a question about it. This is powered by Project Astra.

Gems are customized versions of Gemini that let you have a “gym buddy, sous chef, coding partner or creative writing guide.” Gemini Advanced users will be able to create custom ones, while all users will have access to pre-made Gems, like Learning Coach. 

Simply describe what you want your Gem to do and how you want it to respond — like “you’re my running coach, give me a daily running plan and be positive, upbeat and motivating.” Gemini will take those instructions and, with one click, enhance them to create a Gem that meets your specific needs.

Gemini Advanced users will also get an “immersive planner” that goes beyond just suggesting activities but actually factors travel times and stops, as well as people’s interests, to create a detailed itinerary. Gemini will use flight/travel details in Gmail, Google Maps recommendations for food and museums near your hotel, and Search for other activities. 

FTC: We use income earning auto affiliate links. More.

Latest article