Monday, November 4, 2024

How Pixel Recorder is using Gemini Nano with Multimodality, which is ~2x larger than 1.0

Must read

Like the Screenshots app, Recorder is using Gemini Nano with Multimodality on the Pixel 9 series. Google shared more details about the integration earlier this week.

Broadly, the Recorder credits last year’s addition of Gemini Nano-powered summaries as contributing to a “significant increase in app engagement and user retention overall.” Specifically, “users have been using the new AI-powered summarization feature averaging 2 to 5 times daily, and the number of overall saved recordings increased by 24%.”

On the Pixel 9 series, Recorder is using Gemini Nano with Multimodality, which allows for image and audio input in addition to text. The model is “significantly larger than the previous one” — specifically, “nearly twice as large” — as well as “more capable, accurate, and scalable.”

What that means for developers is the quality out-of-the-box doesn’t necessarily require fine tuning, which means more ease of use and supporting more creative use cases…

Google has yet to really detail Gemini Nano with Multimodality, though there was a mention at the keynote of how it’s “three times more capable and sophisticated” than the original on the Pixel 8 Pro. Last year, there was a technical report on the Gemini 1.0 family that included how there are two Gemini 1.0 Nano versions: “1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively. We don’t know whether the new Multimodal version is part of the Gemini 1.5 family, or if its development is part of a different branch.

Anyways, the model’s expanded token support lets Recorder “summarize much longer transcripts than before.” Another thing made possible by multimodality is the “inclusion of grammar as a new metric for assessing inference quality.”  

Meanwhile, the Recorder team was able to build upon existing work to adopt Gemini Nano with Multimodality:

Integrating Gemini Nano with multimodality required another round of fine-tuning. However, Recorder developers were able to use the original Gemini Nano model’s fine-tuning dataset as a foundation, streamlining the development process.

Besides a Recorder app on the Pixel Watch 3 that transfers the audio file to the phone for transcription, Google Is already working on “at least two more GenAI features that help people get time back.” It’s already being demoed internally for early feedback.

FTC: We use income earning auto affiliate links. More.

Latest article