Yesterday, we shared with you a preview of what you can do with Google’s new Gemini-powered “Ask This Page” feature, which was announced at I/O 2024. Today we’re getting our hands on another upcoming “Ask This…” feature, the one that works on YouTube videos.
Just like yesterday, this is an early hands-on preview with Ask This Video. The feature is not live yet, but Android Authority managed to activate it in the Google app. So, while we tried to push it a bit and see what it can do and where it might fail, there could still be room for improvement before Google launches it to the public.
Gemini Ask This Video: What it is and how it works
Rita El Khoury / Android Authority
Ask This Video is an upcoming Gemini-powered generative AI feature that helps you ask questions about any YouTube video you’re currently watching. Instead of scrubbing and skipping through different parts of that video to find a specific bit of information, you’ll be able to query Gemini and it’ll try to find the answer in that video, without coloring outside the lines. In theory, this should be a big time-saver if you’re looking for a specific information in a YouTube video and you don’t want to waste time trying to find it.
To activate Ask This Video, you just tap and hold the power button to pull up Gemini on your Android phone while watching a YouTube video. Gemini is context-aware now, so it’ll know you triggered it in YouTube and surface an “Ask this video” chip on top of the pop-up menu. See the image above for reference.
Tap that and you’ll notice that Gemini has now attached the video to the pop-up, so you can start typing questions in your natural language and Google’s AI will try to find answers. It takes about 6-8 seconds for Gemini to process the request and come back with an answer.
Ask This Page understands nuance sometimes
In the example above, you can see we asked Gemini about Android Authority‘s “Pixel 8a is here, but why” video where my colleague C. Scott Brown argued that the Pixel 8a is a good phone, but its value and competitiveness is diminished by the better and frequently-discounted Pixel 8. But suppose you haven’t watched that video and you need to know what’s wrong about the phone in a few words to see if this is worth watching (spoiler: it is good content). You could do what we did and check with Gemini to see what’s wrong or bad about the Pixel 8a. And I think it pretty much nailed the nuance of C. Scott’s argument.
Rita El Khoury / Android Authority
In the next example above, I asked it for the differences between the Nothing Ear and Ear (a) in my video. It didn’t list every single difference, but focused on the biggest ones and synthesized the most important bits. In in the video, I mention these features and differentiating factors in several places, but not in succession, so once again, it understood that and didn’t make any mistakes in its summary. The answer is incomplete, though, in my opinion, as there are other factors to consider between the two earbud models. But for an early AI version, I’ll consider this a win. (Such is the state of AI summaries now that an accurate answer is counted as a win, even if it’s incomplete).
Ask This Page can find an answer faster than you can say skip
I think the most impressive part of Ask This Video is how easily it can answer a pressing question, without you having to watch the whole video to unearth it. It’s not perfect yet, but in the case of my hands-on with Chipolo’s new Find My Device trackers, it correctly answered that you don’t need a separate app to use the trackers, and in Carlos Ribeiro’s fast-charging myths and truths video, it nailed his recommendation of sticking with 100W cables to keep your gear future-proof.
Ask This Video has the potential to become a genuinely useful feature when skimming videos and looking for answers. Speaking from personal experience, YouTube has become my go-to resource now for specific tutorials and how-tos (I find that the quality there is better than the random hundreds of SEO-targeted written articles), but it’s usually tough to find the exact piece of information I’m looking for in a lengthy video. I used to turn to YouTube’s video transcripts and search for specific keywords in them to quickly find my answer. Gemini should be much faster and more practical than that trick.
Google still has to fine-tune Ask This Page
As with everything AI, and specifically Google AI, things aren’t 100% perfect just yet. We didn’t try to “red team” Ask This Video, we just went for regular tech videos and questions. I’m sure when this feature goes live and people start pushing it to its limits, they could make it give bad, weird, and potentially unacceptable answers.
Going back to our tests, we ran across a couple of instances where Ask This Page wasn’t 100% spot on. In the first example above, we asked it whether the Pixel 8a was powerful and whether there was a better phone, based on my Pixel 8a tests video. The first time it answered, it only used the first half of the video where I compared the 8a against the Pixel 7a and 8, which resulted in a glowing answer in favor of the new phone.
None of that was technically wrong, but it wasn’t the full picture. Since we know that the second half of the video looks at the competition, we tried to rephrase the question to nudge it in the right direction, and that’s where it told us that the OnePlus 12R is a more powerful phone in the same price range.
The problem is that random viewers won’t have this kind of context, so they might take the first answer at face value and not realize that the video went into a different set of comparisons later and that there’s a more capable phone for the same price. This is the kind of context that I’m afraid AI summaries will miss again and again, until they get better at it. As someone who’s only recently become a YouTuber, I’ve seen so many depressing comments from people who didn’t watch my videos and jumped on a word in the title or the intro without seeing the nuance, and I fear these kinds of incomplete or wrong AI answers will create more situations like that where we’ll be blamed for the AI’s failure to summarize or synthesize something correctly.
Rita El Khoury / Android Authority
The final example is the one where Gemini veered off-track. We asked it about the best analog options among my 10 favorite watch faces for the Pixel and Galaxy Watch and it returned three options. Only one — Nothing Fancy — is correct. Sport XR is a digital watch face and I even say that in the video when I introduce it. Material Stack is also a digital design, though I don’t mention it explicitly. Meanwhile, Gemini failed to find the option that is simply and obviously called “Analogue watch face.” It also missed “Typograph,” another watch face that I explicitly mention as having an analog design.
Let’s face it, though, this is not as dire as those terrible AI results in Google search, but if this kind of simple error can occur with watch faces, then who’s to say what can happen with more nuanced and complicated videos?
We kept our focus on tech in these early tests, but there’s a bit of everything on YouTube, from politics to social issues, cooking tutorials, sports highlights, and more. Even though Google has this ever-present “Gemini may display inaccurate info, including about people, so double-check its responses” notice at the bottom of the pop-up, we all know that most people will eventually just rely on the answer they’re getting. Errors in answers can be very detrimental, both to the viewer and the video creator, as more and more people start relying on Gemini and trusting it with their everyday queries.
Personally, I’m not a fan of this “move fast, break things, and ask for forgiveness later” approach with AI. I would have preferred if Google tested it more and waited for it to mature before throwing it out in the world. But investors and money speak, not users like you and me, so once again, this is another discussion for another day.