Google DeepMind has unveiled Genie 3, a groundbreaking “world model” capable of generating 3D environments in real-time from a single image or text prompt. The tech giant claims this sophisticated new algorithm not only expands possibilities in educational, creative, and gaming experiences. Also represents a significant stride towards Artificial General Intelligence (AGI) – AI that matches or surpasses human cognitive abilities.
Google’s Genie 3: A New World Model Paves the Way for Real-Time 3D Environments and General AI
According to The Black Box Lab, a business development agency, world models are algorithms that construct internal representations of an environment. This allows AI solutions to simulate events and predict future outcomes based on these internal emulations. The goal is to replicate human reasoning processes, granting machines a deeper understanding of the physical context.
Unlike traditional generative models, which might learn that a basketball bounces from years of video data but lack a true understanding of why, a world model possesses a “basic understanding” of the rebound’s causes. This allows it to represent the phenomenon and anticipate future movements with greater accuracy.
Google’s AI Ambitions Accelerate with Genie 3
Google has been investing heavily in this area. Late last year, the company introduced Genie 2, a model capable of creating interactive worlds from images. In January, Google formed a dedicated team for world model development, led by Tim Brooks, formerly co-director of OpenAI’s Sora project.
With Genie 3, Google achieves its most significant advancement yet. It is Google’s first world model to enable real-time interaction, boasting remarkable improvements in consistency and realism compared to its predecessor.

The new algorithm automatically generates virtual 3D environments that users or AI agents can explore for “several minutes.”
Enhanced Realism and Interactive Capabilities
Simulations are produced at a resolution of 720 pixels and 24 frames per second. Crucially, Genie 3 is compatible with “prompted world events”. Meaning environments can be modified through commands that alter aspects like weather or introduce new characters into a scene.
Google’s team highlights that one of Genie 3’s most significant upgrades is its ability to maintain the physical characteristics of spaces for approximately one minute. This means that if a user leaves a scene and returns within that timeframe, elements like a parked car, a hanging picture, or writing on a whiteboard will remain intact.
Developers noted that achieving this level of real-time control and interactivity required significant technical advancements. “During the autoregressive generation of each frame, the model must consider the prior trajectory, which accumulates over time”, they explained.
“For example, if a user returns to a location after a minute, the model must retrieve the corresponding information from a minute ago. To maintain real-time interactivity, this computation needs to be performed several times per second in response to new inputs”.
This sophisticated interaction capability underscores Google’s commitment to pushing the boundaries of AI. Moving beyond simple content generation towards creating truly immersive and intelligent virtual experiences.