The race towards AGI is getting intensified. After making Veo and Imagen 3 available on Google Cloud recently, Google DeepMind has introduced Genie 2, a large-scale foundation world model capable of generating a wide variety of playable 3D environments.
Genie 2 facilitates the development of embodied AI agents by transforming a single image into interactive virtual worlds that can be explored by humans or AI using standard keyboard and mouse controls.
“Genie 2 could enable future agents to be trained and evaluated in a limitless curriculum of novel worlds. This research also paves the way for new, creative workflows for prototyping interactive experiences,” Google Deepmind said in its blog post.
“The world model is taking shape,” said Google DeepMind chief Demis Hassabis.
Building on its predecessor, Genie 1, which was limited to 2D environments, Genie 2 expands these capabilities into 3D. Its features include simulating physical interactions, modelling complex animations, and creating environments with realistic physics, lighting, and object interactions. Trained on a large-scale video dataset, Genie 2 employs autoregressive latent diffusion technology to generate frames sequentially in response to user actions.
Google DeepMind said that games have been central to its AI research, from early experiments with Atari games to breakthroughs like AlphaGo and AlphaStar. However, training more general AI agents has been hindered by limited, diverse training environments. Genie 2 addresses this gap, offering “long-horizon consistency” to simulate evolving scenarios and enabling agents to explore environments dynamically.
Google DeepMind showcased an AI agent named SIMA, which performed tasks in Genie 2-generated worlds by following natural-language instructions. The company said SIMA’s performance highlights the model’s ability to create unique testing environments for agents, demonstrating generalisation to unseen tasks.
Another key feature of Genie 2 is its capacity to generate new content on the fly, maintaining consistent worlds for up to a minute. The model can also generate diverse perspectives, such as first-person and isometric views, and simulate real-world environments using photos.
In addition, Google DeepMind has also launched GenCast. This new AI model enhances weather predictions by providing faster and more accurate forecasts up to 15 days ahead, while also addressing weather uncertainties and risks.
Google recently launched its experimental AI model, Gemini-Exp-1121, which rivals OpenAI’s GPT-4o. The company is also preparing to launch Google Gemini 2, which is expected to compete with OpenAI’s upcoming model, o1.
In an exclusive interview with AIM, AI sceptic Gary Marcus said that DeepMind is likely on a better path towards AGI compared to its competitors. He, however, indicated that no company has yet found the definitive route to AGI. “Of the major companies working on this, DeepMind is most likely to be on the correct path,” he said.
“DeepMind has done the most interesting work in the field,” he said, adding that AlphaFold, although not as general-purpose as GPT-4o, demonstrates the successful application of neuro-symbolic AI in the domain of protein folding.
Christmas has come early for OpenAI
After celebrating ChatGPT’s second birthday, OpenAI has announced plans to release new models and features over the next twelve days. “12 days. 12 livestreams. A bunch of new things, big and small. 12 Days of OpenAI starts tomorrow,” the company shared on its official X account.
“🎄🧑🎄 Starting tomorrow at 10 am Pacific, we are doing 12 Days of OpenAI. Each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers. We’ve got some great stuff to share, hope you enjoy! Merry Christmas,” posted OpenAI chief Sam Altman on X.
ChatGPT has reached over 300 million weekly active users, according to Altman. The company plans to grow this figure nearly four times over the next year, with a goal of 1 billion users. Moreover, 1 billion user messages are sent on ChatGPT every day, and 1.3 million developers have built on OpenAI in the US, according to OpenAI.
According to sources, OpenAI is likely to release its much-awaited video generation model, Sora, and the complete reasoning model, o1. The company could also finally launch its voice engine and demo its web browser. Recently, OpenAI hired Darin Fisher, a key creator of Google Chrome and former Google VP of Engineering.
There are also reports that OpenAI is planning to launch its agent, called ‘Operator,’ which is expected to be released in January 2025. Speculation suggests that this would allow ChatGPT to control and view desktops as an agent.
This year OpenAI launched several new features, including ChatGPT Search for real-time answers, a canvas interface for writing and coding, Advanced Voice with multiple accents, a cost-effective GPT-4o mini model, alongside o1 preview and o1 mini for reasoning.
OpenAI co-founder and president Greg Brockman recently returned from a sabbatical. However, in his recent post on X, he revealed that during that time, he was training DNA foundational models with the Arc Institute.
“I’ve long believed that deep learning should unlock unprecedented improvements in medicine and healthcare — not just for humans but for animals too. I think this work moves in that direction, as one fundamental step toward modeling virtual cells,” he posted.
In other news, Xiaohua Zhai, a long-time Google DeepMind researcher, revealed that he is joining OpenAI. “After an amazing decade at Google/DeepMind, I’m thrilled to announce that I’ll be joining OpenAI in a few weeks! I’m excited for the opportunity to co-build the OpenAI Zürich office,” he wrote on X. Zhai will collaborate with OpenAI researchers Geoff Hinton and Alexander Kolesnikov to establish and expand the Zürich office.
OpenAI recently hired Kate Rouch as its first chief marketing officer. “Her experience in both consumer and enterprise marketing will shape how we connect people to the transformative power of AI,” the company said.
Interestingly, in the last 18 months, OpenAI has poached 85 Google employees, half of whom are engineers. This is evident from the fact that Altman believes in hiring veteran talent rather than a bunch of 24-year-old programmers.