Monday, December 23, 2024

Everything Google Just Announced at I/O 2024 – Video

Must read

Operating Systems

Speaker 1: Welcome to Google io. It’s great to have all of you with us. More than 1.5 million developers use Gemini models across our tools. You’re using it to debug code, get new insights, and build the next generation of AI applications. We’ve also been bringing Gemini’s breakthrough capabilities across our products in powerful ways. We’ll show examples today across search, photos, workspace, Android, and more.

Speaker 2: Today we [00:00:30] have some exciting new progress to share about the future of AI assistance that we’re calling Project Astra Building. On our Gemini model, we’ve developed agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events and caching this for efficient recall.

Speaker 3: Tell me when you see something that makes sound.

Speaker 4: I see a speaker which makes sound.

Speaker 3: Do you remember where you saw my glasses?

Speaker 4: [00:01:00] Yes, I do. Your glasses. Were on the desk near a red apple.

Speaker 3: What can I add here to make this system faster?

Speaker 5: Adding a cache between the server and database could improve speed.

Speaker 3: What does this remind you of?

Speaker 5: Schrodinger’s cat.

Speaker 2: Today I’m excited to announce our newest, most [00:01:30] capable generative video model called vo. VO creates high quality 10 80 P videos from text, image and video prompts. It can capture the details of your instructions in different visual and cinematic styles. You can prompt for things like aerial shots of a landscape or time lapse and further edit your videos using additional prompts. You can use VO in our new experimental tool called Video fx. [00:02:00] We’re exploring features like storyboarding and generating longer scenes. VO gives you unprecedented creative control.

Speaker 6: Core technology is Google Deep Minds generative video model that has been trained to convert input text into output video.

Speaker 7: It looks

Speaker 6: Good. We are able to bring ideas to life that were otherwise not possible. We can visualize things on a timescale that’s 10 or a hundred times faster than before.

Speaker 1: Today [00:02:30] we are excited to announce the sixth generation of TPUs called Trillium.

Speaker 1: Trillium delivers a 4.7 x improvement in compute performance per chip over the previous generation. It’s our most efficient and performant TPU Today we’ll make Trillium available to our cloud customers in late 2024. Alongside our TPUs, we are proud to offer CPUs and [00:03:00] GPUs to support any workload that includes the new axion processes we announced last month, our first custom ARM-based CPU with industry leading performance and energy efficiency, we are also proud to be one of the first cloud providers to offer Nvidia cutting edge Blackwell, GPUs available in early 2025. One of the most exciting transformations with Gemini has been in Google search in the past year. We’ve answered billions of queries as part [00:03:30] of her search generative experience. People are using it to search in entirely new ways and asking new types of questions longer and more complex queries, even searching with photos and getting back the best the web has to offer. We have been testing this experience outside of labs and we are encouraged to see not only an increase in search usage, but also an increase in user satisfaction. I’m excited to announce that we will [00:04:00] begin launching this fully revamped experience AI overviews to everyone in the US this week, and we’ll bring it to more countries soon.

Speaker 8: Say you’re heading to Dallas to celebrate your anniversary and you’re looking for the perfect restaurant. What you get here breaks AI out of the box and it brings it to the whole page. Our Gemini model uncovers the most interesting angles for you to explore and organizes these results into these helpful clusters. Like you might [00:04:30] never have considered restaurants with live music or ones with historic charm. Our model even uses contextual factors like the time of the year. So since it’s warm in Dallas, you can get rooftop patios as an idea and it pulls everything together into a dynamic whole page experience. You’ll start to see this new AI organized search results page when you look for inspiration, starting with dining and recipes and coming to movies, music, [00:05:00] books, hotels, shopping and

Speaker 9: More. I’m going to take a video and ask Google why will this not stay in place ending a near instant. Google gives me an AI overview, I guess some reasons this might be happening and steps I can take to troubleshoot selects like first. This is called a toner. Very helpful and it looks like it may be unbalanced and there’s some really helpful steps here and I [00:05:30] love that because I’m new to all this. I can check out this helpful link from Audio Technica to learn even more.

Speaker 10: And this summer you can have an in-depth conversation with Gemini using your voice. We’re calling this new experience live using Google’s latest speech models. Gemini can better understand you and answer naturally. You can even interrupt while Gemini is responding and it will adapt to your speech patterns. And this is just the beginning. [00:06:00] We’re excited to bring the speed gains and video understanding capabilities from Project Astra to the Gemini app. When you go live, you’ll be able to open your camera so Gemini can see what you see and respond to your surroundings in real time. Now the way I use Gemini isn’t the way you use Gemini, so we’re rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic [00:06:30] you want. We’re calling these gems, they’re really simple to set up. Just tap to create a gym, write your instructions once and come back whenever you need it.

Speaker 11: We’ve embarked on a multi-year journey to reimagine Android with AI at the core, and it starts with three breakthroughs You’ll see this year first we’re putting AI powered search right at your fingertips, creating entirely new ways to get the answers you need. [00:07:00] Second Gemini is becoming your new AI assistant on Android. There to help you anytime. And third, we’re harnessing on device AI to unlock new experiences that work as fast as you do while keeping your sensitive data private. One thing we’ve heard from students is that they’re doing more of their schoolwork directly on their phones and tablets. So we thought could circle the search. Be your perfect [00:07:30] study buddy. Let’s say my son needs help with a tricky physics word problem like this one. My first thought is, oh boy, it’s been a while since I’ve thought about kinematics. If he’s stumped on this question, instead of putting me on the spot, he can circle the exact part he’s stuck on and gets step-by-step instructions, right where he is already doing the work.

Speaker 12: Now we’re making Gemini context aware so it can anticipate what you’re trying to [00:08:00] do and provide more helpful suggestions in the moment. In other words, to be a more helpful assistant. So let me show you how this works and I have my shiny new pixel eight A here to help me.

Speaker 12: So my friend Pete is asking if I want to play pickleball this weekend and I know how to play tennis, I have to say that for the demo, but I’m new to this pickleball thing, so I’m going to reply and try to be funny and I’ll say, is that [00:08:30] like tennis? But with pickles, this will be actually a lot funnier. What a meme. So let me bring up Gemini to help with that and I’ll say create image of tennis with pickles. Now one you think you’ll notice is that the Gemini window now hovers in place above the app so that I stay in the flow. Okay, so that generated some pretty good images. What’s is I can then drag and drop any of these directly into the messages app below [00:09:00] and now I can ask specific questions about the video. So for example, wat is can type the two bounce rule because that’s something that I’ve heard about but don’t quite understand in the game.

Speaker 12: By the way, this uses signals like YouTube’s captions, which means you can use it on billions of videos. So give it a moment and there and get a nice succinct answer the ball in the bands once on each side of the court after a serve. So instead of trolling [00:09:30] through this entire document, I can pull up Gemini to help. And again, Gemini anticipates what I need and offers me an ask this PDF option. So if I tap on that, Gemini now ingest all of the rules to become a pickleball expert. And that means I can ask very esoteric questions like for example, our spin serves allowed and there you have it. It turns out, nope, spin serves are not allowed. [00:10:00] So Gemini not only gives me a clear answer to my question, it also shows me exactly where on the PDF to learn more. Building Google AI directly into the OS elevates the entire smartphone experience and Android is the first mobile operating system to include a built-in on-device foundation model. This lets us bring Gemini goodness from the data center right into your pocket so the experience is faster while also protecting your privacy. Starting with pixel. [00:10:30] Later this year we’ll be expanding what’s possible with our latest model Gemini Nano with multimodality. This means your phone can understand the world the way you understand it, so not just through text input, but also through sites sounds and spoken language.

Speaker 1: Before we wrap, I have a feeling that someone out there might be counting how many times we have mentioned AI today, and since [00:11:00] the big theme today has been letting Google do the work for you, we went ahead and count it so that you don’t have to, that might be a recording. How many times someone has said AI.

Latest article