Tuesday, November 5, 2024

Meta’s Movie Gen Makes Convincing AI Video Clips

Must read

Meta just announced its own media-focused AI model, called Movie Gen, that can be used to generate realistic video and audioclips.

The company shared multiple 10-second clips generated with Movie Gen, including a Moo Deng-esque baby hippo swimming around, to demonstrate its capabilities. While the tool is not yet available for use, this Movie Gen announcement comes shortly after its Meta Connect event, which showcased new and refreshed hardware and the latest version of its large language model, Llama 3.2.

Going beyond the generation of straightforward text-to-video clips, the Movie Gen model can make targeted edits to an existing clip, like adding an object into someone’s hands or changing the appearance of a surface. In one of the example videos from Meta, a woman wearing a VR headset was transformed to look like she was wearing steampunk binoculars.

An AI-generated video made from the prompt “make me a painter.”

Courtesy of Meta

An AI-generated video made from the prompt “a woman DJ spins records. She is wearing a pink jacket and giant headphones. There is a cheetah next to the woman.”

Courtesy of Meta

Audio bites can be generated alongside the videos with Movie Gen. In the sample clips, an AI man stands near a waterfall with audible splashes and the hopeful sounds of a symphony; the engine of a sports car purrs and tires screech as it zips around the track, and a snake slides along the jungle floor, accompanied by suspenseful horns.

Meta shared some further details about Movie Gen in a research paper released Friday. Movie Gen Video consists of 30 billion parameters, while Movie Gen Audio consists of 13 billion parameters. (A model’s parameter count roughly corresponds to how capable it is; by contrast, the largest variant of Llama 3.1 has 405 billion parameters.) Movie Gen can produce high-definition videos up to 16 seconds long, and Meta claims that it outperforms competitive models in overall video quality.

Earlier this year, CEO Mark Zuckerberg demonstrated Meta AI’s Imagine Me feature, where users can upload a photo of themselves and role-play their face into multiple scenarios, by posting an AI image of himself drowning in gold chains on Threads. A video version of a similar feature is possible with the Movie Gen model—think of it as a kind of ElfYourself on steroids.

What information has Movie Gen been trained on? The specifics aren’t clear in Meta’s announcement post: “We’ve trained these models on a combination of licensed and publicly available data sets.” The sources of training data and what’s fair to scrape from the web remain a contentious issue for generative AI tools, and it’s rarely ever public knowledge what text, video, or audioclips were used to create any of the major models.

It will be interesting to see how long it takes Meta to make Movie Gen broadly available. The announcement blog vaguely gestures at a “potential future release.” For comparison, OpenAI announced its AI video model, called Sora, earlier this year and has not yet made it available to the public or shared any upcoming release date (though WIRED did receive a few exclusive Sora clips from the company for an investigation into bias).

Considering Meta’s legacy as a social media company, it’s possible that tools powered by Movie Gen will start popping up, eventually, inside of Facebook, Instagram, and WhatsApp. In September, competitor Google shared plans to make aspects of its Veo video model available to creators inside its YouTube Shorts sometime next year.

While larger tech companies are still holding off on fully releasing video models to the public, you are able to experiment with AI video tools right now from smaller, upcoming startups, like Runway and Pika. Give Pikaffects a whirl if you’ve ever been curious what it would be like to see yourself cartoonishly crushed with a hydraulic press or suddenly melt in a puddle.

Latest article