Friday, November 22, 2024

How Ask Photos in Google Photos will work

Must read

Gemini-powered Ask Photos is coming to Google Photos this summer, and the company this week shared a bit more about how it works. 

The Google Research team says Ask Photos is a “powerful example of how Gemini models can act as agents via function calling and memory capabilities.” Sample queries Google has provided outside the on-stage announcement include:

  • “Show me the best photo from each national park I’ve visited.” 
  • “What themes have we had for Lena’s birthday parties?”

Your conversational query is “passed to an agent model that uses Gemini to determine the best retrieval augmented generation (RAG) tool for the task.”

Typically, the agent model begins by understanding the user’s intent and formulates a search through their photos using an updated vector-based retrieval system, which extends the already powerful metadata search built into Photos.

That system is better at understanding natural language concepts, like “a person smiling while riding a bike,” than keyword search. 

An answer model then looks at the photos and videos returned by search. “Gemini’s long context window and multimodal capabilities” are leveraged to find the most relevant information. Beyond the visual content and any text, dates, locations, and other metadata is used. 

Finally, the answer model crafts a helpful response grounded in the photos and videos it has studied.

What’s interesting is how you can correct Ask Photos and the app will remember that information for future conversations. In this regard, it’s more than a search feature and could be used like an assistant. You will be able to “view and manage remembered details at any time.”  

This experimental feature, which could be related to the rumored Project Ellman, is rolling out over the coming months and more capabilities are already being teased.

More on Google Photos:

FTC: We use income earning auto affiliate links. More.

Latest article