Meta just announced proprietary media-focused AI model called Movie Gen that can be used to generate realistic video and audio clips.
The company shared a slew of 10-second clips generated with Movie Gen, including a Moo Deng-esque baby hippo swimming around to demonstrate its capabilities. Although the tool is not yet available for use, this Movie Gen announcement comes shortly after the Meta Connect event, which showcased new and updated hardware and the latest version of its large language model, Llama 3.2.
Going beyond generating simple text in video clips, the Movie Gen model can make targeted edits to an existing clip, such as adding an object in someone’s hands or changing the appearance of a surface. In one of the example videos from Meta, a woman wearing a VR headset was transformed to look like she was wearing steampunk binoculars.
Audio clips can be generated along with the videos with Movie Gen. In the sample clips, an AI man stands near a waterfall with sonic splashes and the hopeful sounds of a symphony; a sports car’s engine purrs and tires screech as it moves around the track, and a snake slithers across the jungle floor accompanied by blaring horns.
Meta shared some additional details about Movie Gen in a research paper published on Friday. Movie Gen Video consists of 30 billion parameters, while Movie Gen Audio consists of 13 billion parameters. (The number of model parameters roughly corresponds to how capable it is; by contrast, the largest Llama 3.1 variant has 405 billion parameters.) Movie Gen can produce high-definition videos up to 16 seconds long, and Meta claims to outperform competing models in overall video quality.
Earlier this year, CEO Mark Zuckerberg demonstrated Meta AI’s Imagine Me feature, where users can upload a photo of themselves and act out the role of their face in multiple scenarios by posting an AI image of themselves drowning in gold chains in Threads. A video version of a similar feature is possible with the Movie Gen model – think of it as a kind of ElfYourself on steroids.
What information is Movie Gen trained on? Specifics are unclear in Meta’s announcement: “We trained these models on a combination of licensed and publicly available datasets.” The sources of training data and what is fair to delete from the network remain a contentious issue for generative AI tools and it is rarely publicly known what text, video or audio clips were used to create any of the core models.