Wednesday, December 18, 2024

Google Cloud launches Veo AI video generator model on Vertex

Must read


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


As Amazon takes a major step into the AI space with its new Nova family of foundation models, Google is doubling down on its own multimodal AI capabilities. The tech giant’s cloud division has announced that its latest video and image-generation models, Veo and Imagen 3, are now available on Vertex AI.

This move empowers teams to integrate cutting-edge video and image-generation capabilities into their AI workflows, unlocking diverse use cases—especially in marketing and advertising. It also makes Google Cloud the first hyperscaler to offer a video model to its customers. 

While the Veo model is currently in private preview, Imagen 3 will be generally available to all Vertex AI users starting next week. Notably, Imagen 3 also includes editing features, enabling users to refine generated images to meet specific creative needs.

What do Veo and Imagen 3 offer?

First unveiled at Google’s I/O developer conference, Veo is Google DeepMind’s response to competitors like Runway’s Gen-3 and OpenAI’s Sora, delivering a sophisticated video-generation experience. The model transforms text or image prompts into cinematic, high-definition videos in various visual styles, generating clips over 60 seconds long. What sets it apart is frame-level consistency, ensuring subjects move seamlessly within shots.

Imagen 3, also from DeepMind, takes on the task of text-to-image generation, producing photorealistic visuals in a variety of styles. Google claims it surpasses its predecessors in detail, lighting accuracy and artifact reduction.

Beyond generation, users on Google’s allowlist can also access advanced customization options with Imagen 3. These include image upscaling, inpainting, outpainting and background replacement—all guided by text prompts. Additionally, users can provide reference images, enabling Imagen 3 to create content aligned with specific brand aesthetics, logos or product features.

Broader implications for industry

Vertex AI has long been Google Cloud’s flagship platform for streamlining AI application development and deployment. By integrating Veo and Imagen 3, the platform offers organizations an even more comprehensive suite of tools to innovate in marketing, sales and beyond.

Imagen 3, for instance, simplifies the creation of high-quality assets such as product images and social media content, while Veo extends this capability by offering teams an option to convert these visuals into polished videos. The speeds up production, cuts costs, and accelerates prototyping, allowing teams to iterate rapidly on their creative strategies.

“Customers like Agoda are using the power of AI models like Veo, Gemini, and Imagen to streamline their video ad production, achieving a significant reduction in production time,” said Warren Barkley, senior director of product management at Google, in a blog post. He also highlighted that both models include safety features like digital watermarking and content moderation guardrails to mitigate risks associated with generative AI.

Other early adopters include Mondelez International—owner of brands such as Oreo, Cadbury, and Milka—and global marketing and communications service WPP. As Google’s foundation models expand their reach, businesses across industries have a powerful opportunity to reimagine how they create and deliver visual content. 

Competition continues to heat up

While all major cloud providers, including Google Cloud, Amazon Web Services and Microsoft Azure, have been providing image generation models on their respective AI orchestration platforms, video generation has been quite a rarity thus far. Google’s move to launch Veo in private preview today changes that. 

Interestingly, soon after the Veo announcement, AWS made a splash at re:Invent with the announcement of Nova Reel, a foundation model that generates six-second-long studio-quality videos from text and image prompts.

This model, along with others in the Nova family, is set to become available via Amazon Bedrock, the company’s fully managed service designed to simplify the creation and deployment of generative AI applications. 

Microsoft, on its part, appears to be lagging in this category at this stage. Its AI Foundry does not include models for video generation. However, we expect that to change as soon as OpenAI’s Sora hits the market.

Latest article