Google DeepMind announced Monday (Jan. 6) that it is creating a new team to work on “massive” generative models that would “simulate the world.” These models represent the next stage of advancement in artificial intelligence (AI) capabilities in decision-making, planning and creativity.
World models are computational frameworks that help AI systems understand and simulate the real or virtual world. They are key to helping teach AI systems to navigate an environment and have widespread applications in robotics, gaming and autonomous systems.
For example, autonomous vehicles use these world models to simulate traffic and road conditions. They can also train generalist AI robots in different environments. A common problem is the lack of rich, diverse and safe training environments for so-called embodied AI.
DeepMind’s job posting on Monday said that scaling AI models is also important to the tech’s evolution.
“We believe scaling pretraining on video and multimodal data is on the critical path to artificial general intelligence. World models will power numerous domains, such as visual reasoning and simulation, planning for embodied agents, and real-time interactive entertainment,” the job posting said. PYMNTS reached out to Google but has yet to receive a reply.
Tim Brooks, who left OpenAI in October to join Google DeepMind, will lead the team. At OpenAI, Brooks co-led the development of Sora, its video generation model that went viral upon unveiling because of its sophistication.
According to job listings for the team, the new hires will “collaborate with and build on” the work from Gemini, Google’s flagship large multimodal model, Veo (video generation model), and Genie (world model) teams.
Google DeepMind’s focus on world models comes as AI startup World Labs said it when it came out of stealth last September. The startup is developing large world models. Led by Stanford AI pioneer Fei Fei Li, the startup is funded by AI pioneer and Nobel laureate Geoffrey Hinton, Salesforce CEO Marc Benioff, LinkedIn co-founder Reid Hoffman, former Google Chairman Eric Schmidt as well as Andreessen Horowitz, NEA, NVentures and others.
Google DeepMind has already developed several world models, including Genie and Genie 2. Genie 2 can turn text and image into 3D worlds that react according to a user’s actions in this environment. (Genie created only 2D worlds).
Genie 2 is a powerful AI model that learns from a large video dataset and uses a process that compresses video frames into simpler, meaningful representations through an autoencoder. These compressed frames are then analyzed by a transformer model that predicts how the video should progress, step-by-step, using a method similar to how text-generating models like ChatGPT work.
Trained on a large-scale video dataset, Genie 2 can display object interactions, complex character animation, physics (such as gravity and splashing water effects) and behavior modeling of other agents. The world it creates can last up to a minute, with most in the 10- to 20-second range.
Google DeepMind’s expanded focus on world models will further sharpen its AI systems’ capabilities as it competes with OpenAI, Meta, Microsoft and Amazon in serving enterprises.
The latest innovation adds to its already rich array of innovations, one of which most recently led to Nobel Prize nods for CEO Demis Hassabis and John M. Jumper: AlphaFold2. It is an AI model that predicted the nature of all known proteins, solving a 50-year biochemistry challenge.
In a paper published in October, Google DeepMind researchers said they trained a large language model called the Habermas Machine to serve as an AI mediator that helped small U.K. groups find common ground on controversial issues such as Brexit or immigration. It did so by writing a “group statement” that captured their shared viewpoints.