Can it hallucinate DOOM? Google Research and Tel Aviv University have successfully simulated DOOM within a neural learning model named GameNGen.
It’s been a big year for the “Can It Run DOOM” scene. We got DOOM running on fecal germs, and a mad scientist taught a lab-grown rat brain to play DOOM. But Google Research and Tel Aviv University have flipped the script with their GameNGen project—these researchers aren’t just running DOOM in an AI environment, they’re simulating DOOM without utilizing any traditional code, visual assets, or game engines. Metaphorically speaking, we now have a neural learning model that can “think” DOOM into existence.
The simulated DOOM is immediately recognizable. It runs in full color at 20 FPS and is powered by a single TPU (tensor processing unit). While this is not the first AI simulation of DOOM (and it isn’t currently playable by humans), it is by far the most impressive and accurate, and it does not require state-of-the-art hardware.
GameNGen’s training was achieved through a two-phase process. First, a reinforcement learning model (a reward-seeking AI, kind of like a lab rat) was taught to play DOOM. Its gaming sessions were recorded and passed on to a diffusion model (an AI that’s comparable to the predictive text algorithm in your smartphone keyboard), which learned to predict and generate in-game visuals. The models were not exposed to DOOM’s source code or visual asset library.
“A complex video game, the iconic game DOOM, can be run on a neural network (an augmented version of the open Stable Diffusion v1.4, in real-time, while achieving a visual quality comparable to that of the original game. While not an exact simulation, the neural model is able to perform complex game state updates, such as tallying health and ammo, attacking enemies, damaging objects, opening doors, and persist the game state over long trajectories.”
While the AI DOOM simulation is obviously very impressive, it’s not perfect. Several “complex game state updates,” like the health meter or enemy movement, are impacted by the tell-tale visual artifacts that we’ve come to associate with generative video. Objects turn into blurry blobs before snapping back into shape, sudden movement is often accompanied by smudgy weirdness, and the health meter restlessly flips between numbers as if it’s not sure whether the player has been attacked.
Still, GameNGen runs DOOM at a better quality and frame rate than most PCs did in the mid-90s. And this is without the elegant DOOM Engine (or any conventional game engine, for that matter). Google Research also found that, when viewing short clips between 1.6 seconds and 3.2 seconds in length, humans had a lot of trouble differentiating the fake DOOM from the real DOOM (their success rate was 58% to 60%). The simulated image is often perfect; it just fails to be consistently perfect.
As for how this research will be used in the future—it’s anyone’s guess. Google Research and Tel Aviv University have proven that an interactive game can run within the paradigm of a neural model. But they did not create a game from scratch. The arduous process of simulating a game within a neural model has no practical or economic benefit as of 2024, so GameNGen is little more than a proof of concept. It certainly isn’t a product.
However, this research may lead to the development of a neural model that can generate unique games. If generative game development can be achieved at a lower cost than traditional game development (while also providing a fun experience for gamers), something like GameNGen could become a viable product. But training may prove to be the biggest hurdle here, as the AI would need a decent understanding of how games work (GameNGen appears to lean heavily on visual observations), and, importantly, it would need a massive dataset containing a diverse array of existing, copyrighted games.
While I’ve tried my best to explain this research, I suggest reading the Diffusion Models Are Real-Time Game Engines whitepaper and visiting the GameNGen Github page.
Source: GameNGen