Google LLC’s cloud division today announced that Veo, Google’s artificial intelligence model that can generate lifelike video from text or images, will be available in private preview for customers via Vertex AI.
The company also said that Imagen 3, Google’s highest-quality text-to-image AI model, will be available to all Google Cloud customers globally on Vertex.
Vertex AI is a managed platform that allows users to train and deploy machine learning models and customize generative LLMs for use in AI-powered applications. It enables users to build, customize, test and experiment with AI models with several tools that cater to both highly technical and less savvy users. It includes access to many foundation models from Google and other sources.
Google unveiled Veo during its Google I/O conference this year and it is the company’s most capable video generation AI model developed by DeepMind, Google’s AI research division. It can create high-quality 1080p resolution video with 24 or 30 frames per second in numerous cinematic and visual styles that can go beyond a minute in length.
“The model provides an unprecedented level of creative control and understands cinematic terms like ‘timelapse’ or ‘aerial shots of a landscape,’” Google wrote about the model in its initial announcement. “Veo creates footage that’s consistent and coherent, so people, animals and objects move realistically throughout shots.”
Veo can use its deep understanding of natural language to produce video from just text but users can also provide images to condition its outputs. By providing a reference image with a text prompt, Veo will generate a video based on the image and use the user’s written text to animate it appropriately.
For example, a user could give the model an image of a dog looking at the camera and the text: “The dog gets up, wags his tail, looking at the camera, happy and full of life.” The model will produce a video with the dog hopping up, tail wagging joyfully, ears flapping, looking side-to-side as if expecting a treat.
In September, YouTube announced that users would be able to generate six-second Shorts clips and backgrounds for their videos using the Veo model using simple text prompts. The company said the animated video backgrounds would launch this year, while six-second video clips would arrive in 2025.
Imagen 3 publicly available on Vertex AI
Now publicly available to all customers, Imagen 3 is Google’s latest and best image generation foundation model that produces lifelike rendering from natural language prompts with numerous improvements over previous models.
Imagen 3 produces high-quality images with good lighting and composition based on conversational prompts. It has been upgraded to accurately render fine details such as wrinkles on a person’s hand or complex textures such as small pieces of grass or the cracks in a brick wall. It also has improved control over the placement of text in produced images. Text production by text-to-image models can be problematic as these models can sometimes generate gibberish or misunderstand prompts.
Additionally, Google said, new features will be coming to Imagen 3 for customers on the allowlist that will help companies edit and customize images. These features include user-friendly ways to change parts of images using prompts such as inpainting, where a section can be selected and modified specifically, and outpainting, where the image can be expanded. In each case the user provides a description of what should appear in the area and the model will generate the appropriate change.
Imagen 3 also has new customization controls that allows users to guide the model with brand, style, logo, subject and product features for generating new images. For example, the user can provide the model with an image of a product, such as a dress, shoes or other fashion items and ask it to generate people wearing them and it will create lifelike images of individuals with those products in the appropriate detail. This will make certain that brand, style and other marketable elements will be respected, the company said.
“Our collaboration with Google has been instrumental in harnessing the power of generative AI, notably through Imagen 3, to revolutionize content production,” said Jon Halvorson, senior vice president of consumer experience and digital commerce of Mondelez International Inc., the snack food business formerly called Kraft Foods Inc. “This technology has enabled us to produce hundreds of thousands of customized assets, enhancing creative quality while significantly reducing both time to market and costs.”
Video frames and images produced by Veo and Imagen 3 are invisibly watermarked for safety by DeepMind’s SynthID, which helps decrease misinformation and misattribution. SynthID digital watermarks in AI-generated content will allow users to quickly identify if that content was produced by an AI model. This is becoming increasingly important as AI-produced images and video become more lifelike, customizable and mass-produced.
Images: SiliconANGLE/Google Imagen 3, Google
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU