Monday, December 23, 2024

Doom Running on a Neural Network Is a Surreal Dreamscape

Must read

We’ve already seen the iconic 1993 video game Doom being played on devices ranging from a candy bar to a John Deere tractor to a Lego brick to E. Coli cells.

Now, researchers at Google and Tel Aviv University have taken the viral trend even further, by using a generative AI model to run the game instead of a conventional video game engine.

The results are about as trippy as one would expect, as seen in a video shared by the researchers, with bad guys morphing in and out of existence and walls shifting unnvervingly.

Visual weirdness aside, it’s still an impressively faithful rendition of the 1993 video game and a striking demonstration of the power of the tech.

“Can a neural model running in real-time simulate a complex game at high quality?” the researchers wrote in their yet-to-be-peer-reviewed paper. “In this work, we demonstrate that the answer is yes.”

“Specifically, we show that a complex video game, the iconic game Doom, can be run on a neural network,” they added.

Conventionally, video game engines react to user inputs and visually render the scene according to a manually programmed set of rules.

But by harnessing the power of diffusion models, used by most mainstream AI image generators like Stable Diffusion and DALL-E, the researchers found they could ditch the approach in favor of AI.

Their new diffusion model, dubbed GameNGen, is based on Stable Diffusion’s open-source version 1.4 and was trained on 900 million frames taken from existing Doom gaming footage.

GameNGen produces the next frame depending on the user’s input, effectively acting as an illusory game engine.

“While not an exact simulation, the neural model is able to perform complex game state updates, such as tallying health and ammo, attacking enemies, damaging objects, opening doors, and persist the game state over long trajectories,” the researchers wrote in their paper.

The researchers, however, admitted there were some clear limitations to their approach.

“The model only has access to a little over 3 seconds of history,” they wrote in their paper. As a result, objects like barrels and bad guys disappear and appear out of nowhere.

Nonetheless, they found that the “game logic is persisted for drastically longer time horizons.”

“While some of the game state is persisted through screen pixels (e.g. ammo and health tallies, available weapons, etc.), the model likely learns strong heuristics that allow meaningful generalizations,” the paper reads.

The tech could open plenty of doors in the world of video game development, potentially lowering costs and making the developmental process more accessible. Games could even be written and edited in text format, or by feeding in AI sample images.

“For example, we might be able to convert a set of frames into a new playable level or create a new character just based on example images, without having to author code,” the team wrote.

“Today, video games are programmed by humans,” the researchers concluded. “GameNGen is a proof-of-concept for one part of a new paradigm where games are weights of a neural model, not lines of code.”

More on generative AI: Google’s AI Now Lets You Wildly Alter Photos Right in Your Camera App

Latest article