Google has unveiled a large-scale foundation world model called Genie 2 that it says is capable of “generating an endless variety of action-controllable, playable 3D environments.”
Genie 2 is the brainchild of the company’s AI research division, Deepmind, and can purportedly create playable 3D worlds “based on a single prompt image.”
It’s billed as a seismic leap forward when compared to Genie 1, which introduced an approach for generating 2D worlds, but what are we actually looking at here?
After a fleeting glance, it’s tempting to proclaim Google has successfully created a model capable of turning rudimentary prompts into expansive virtual realms. The company itself is eager to suggest as much, writing that Genie 2 can “generate a vast diversity of rich 3D worlds” with emergent capabilities such as “object interactions, complex character animation, physics, and the ability to model and thus predict the behavior of other agents.”
“Below (as pictured) are example videos of people interacting with Genie 2. For every example, the model is prompted with a single image generated by Imagen 3, GDM’s state-of-the-art text-to-image model,” says the company.
“This means anyone can describe a world they want in text, select their favorite rendering of that idea, and then step into and interact with that newly created world (or have an AI agent be trained or evaluated in it). At each step, a person or agent provides a keyboard and mouse action, and Genie 2 simulates the next observation. Genie 2 can generate consistent worlds for up to a minute, with the majority of examples shown lasting 10-20s.”
Image via Google
The examples selected by Google to showcase its new model are uniform in their sterility. Character models wander vapid interpretations of forests and deserts, scarcely populated with structures and foliage looped on repeat. Pyramids that bend and warp as the camera swings overhead. A forest of trees echoing ad infinitum.
Emergent interactions take the form of basic movements and character animations that flatter to deceive. The short clips shared by Google (which presumably elected to showcase only the finest Genie 2 outputs) feature bland facsimiles of vaguely familiar video game scenarios that quickly fragment and decay when placed under any sort of scrutiny.
“Genie 2 generates new plausible content on the fly and maintains a consistent world for up to a minute,” states Google. This sentence is not the ringing endorsement the company thinks it is.
Google and other AI companies fail to understand the act of creation
Game developers (and creatives cut from any cloth) do not aspire to create “plausible content.” People (and I’m talking about the humans working on your favorite games, not their corporate employers) create to delight, inspire, entertain.
We create to connect with each other and unpack our shared human condition. To create anything is to send fragments of yourself into the eternal aether, hoping that someone, somewhere, will find meaning in the deliberate, unpredictable, collaborative mess that art so often becomes.
Google evidently doesn’t understand how developers make games, either. There’s intentionality behind every single decision, whether it’s designing an open-world in service of pacing or mechanics–or baking your narrative into the very fabric of the environment. These are not elements you can simply conjure into being with a string of text or AI-generated image.
Genie 2 is a blunt instrument lacking precision or purpose. Google has spent time and resources creating a tool that can scarcely reproduce the most cliched examples of video game guff. A blurry neon city. Fantasy man on horse. Now, it must convince us the crumbling husks churned out by Genie 2 are worthy of our time and attention. Let’s be clear: they are not.
Right now, they’re barely even viable. Google itself has admitted the model is only capable of maintaining consistent worlds for less than a minute. It’s like watching the memory of a memory swirling the cosmic drain. There is only degradation and collapse–presumably at great real-world cost (via The Guardian).
Who is all this for? Based on what Google has showcased here, players dreaming of eventually blinking their own games into reality with the help of AI tools will quickly realize their ‘creations’ are devoid of both style and substance. Two aspects of video games that are unmistakably human.
Developers, meanwhile, have already shown how existing technologies such as procedural generation can be used make entire galaxies corporeal. Robust asset stores exist for those in need of pre-made building blocks to assist with rapid prototyping or full blown production. Engines like Unity, Unreal, Godot, and GameMaker continue to cater to the needs of developers at all levels–while some might choose to build their own from the ground up.
If AI tools like Genie 2 are the solution, what is the problem?
In an era of layoffs, studio closures, financial upheaval, and mass conglomeration, the biggest challenge facing developers is finding the resources needed to unlock their creativity in a way that feels sustainable. In an industry that’s becoming increasingly risk-averse and insular at the top level, developers need those with the deepest wallets to start taking chances on new ideas instead of trying (and so often failing) to replicate the successes of their competitors.
Forget about churning out plausible content. Give developers what they need to pull their most abstract, ambitious ideas into reality. If you really want innovation, invest in people.