Will Google Genie AI Transform the Gaming Industry?

Soon, after the preview of OpenAI Sora happened on 15 February, 2024. Google on the same lines announced its latest AI model, “Genie AI” on 23 February, 2024. Both are basically AI based, “text-to video” world models. The difference is that the Google’s Genie AI is an “action-controllable-playable” world model.

In this latest update, we will understand “What is Genie AI?“, What it will be used for? and few basic differences between Google Genie AI and OpenAI Sora, Genie vs Sora.

What is Google Genie AI Chatbot?

Google AI Genie is an AI model, which is recently announced by developers at Google DeepMind team. It is an advanced AI technology tool, which will be used for developing video games and interactive playable worlds from text prompts or image prompts.

It will be used to build interactive, playable 2D video games or 2D game platformers, and virtual spaces. Using this AI platform, Genie can be used to create video games by only providing either:

  • A single image;
  • A text description or text prompt;
  • Or by a simple sketch made through hands.

Google Genie AI uses the advanced AI technology, which is termed as a “Foundational World Model“. Genie has been trained on a large data of 2,00,000 hours of publicly available game videos on internet. Mainly trained on 2D games platforms for its unsupervised learning or training.

Google Genie AI generating games
Google Genie AI generating games
Image Source: Google DeepMind

Note: If you want to see what kind of amazing stuff Google Genie generates, you can visit Genie AI Website here – Google Genie AI website.

After going through this rigorous training, Genie learned and developed a deeper understanding of how different gaming environments work, interact and how gaming players interact and play with them. This training helps Genie to learn and master other important features used in a game like player actions, power, movements, and when should a level ends.

Genie uses advanced text-to-image technology “Imagen 2” to generate images. These images are then used by Genie, to bring them in action and motion for building video games.

Genie overcomes traditional 2D gaming platform limitations, by learning environments, patterns and player interactions within them. This helps Genie AI create immersive and interactive gaming world experiences, without too much user input.

More Benefits of Genie AI

Besides creating a game or a gaming environment, this Google AI also provide more benefits, such as:

  • It can create a game character, which can perform actions and movements;
  • It can create and interact with other game elements or objects, such as paths, enemies, things or obstacles;
  • It can build actions or events in a video game, such as accidents, collisions, penalties, credits or rewards;
  • It would be able to generate game tasks, game levels and difficulties for a game.

You don’t need to provide any gameplay instructions or manuals to create your own controllable playable virtual worlds. It does that by analyzing videos and learn through them to understand the rules of video games environments and what players has to interact with and play.

This allows you to build an entire 2D game platform with only a text description, a text prompt or an image prompt. Google Genie AI can become your personal assistant for developing video games or interactive playful environment.

Components of Genie AI

Genie AI uses five basic components and techniques to work and operate, these are:

1. Spatiotemporal Transformer (ST)

Google Genie brain relies on a unique kind of transformer known as a Spatiotemporal (ST) Transformer. Unlike the usual transformers crafted for text, ST transformers are finely tuned to comprehend and learn through videos.

They focus on the details within each frame (spatial attention) and also track changes across multiple frames over time (temporal attention). This specialization enables them to effectively process the intricate patterns and details present in moving images.

2. Video Tokenizer

This component “Video Tokenizerdivides the extensive video data into smaller parts known as “tokens.” These tokens serve as the essential pieces for Google Genie AI to grasp and learn the visual world. Imagine turning an entire movie into a sequence of important symbols.

These symbols stand for the basic elements in a video game, like the background, items, characters, enemies and visual effects. This simplification streamlines and speeds up the entire video generation process.

3. Latent Action Model (LAM)

LAM component examines the changes between one frame and the next in videos. The LAM acts like a spy within Google Genie, observing videos to understand the unspoken actions occurring between frames.

It recognizes eight crucial actions needed for playing 2D platformer games, such as jumping, running, shooting, or interacting. These actions act like the “spices” that bring flavor and variety to video games. Because internet videos don’t have action labels, the LAM has to learn to understand these actions independently.

4. Dynamics Model

This part anticipates the next frame in the video sequence and produces the video outcomes, by understanding actions from the LAM. It consistently forecasts how the upcoming video frame should appear and repeats this process until a resolution is achieved. This component acts as the “Soul” that harmonizes everything and forms the foundation for creating video games.

5. VQ-VAE Technique

VQ-VAE is a “Vector Quantized Variational Autoencoder” technique that can create high-quality images from text descriptions. This model or technique or method assists Google Genie in arranging and organizing information.

It is similar to providing a unique helpbook or codebook for both the video tokenizer and the LAM to convert things into smaller, more manageable parts. This enhances the efficiency of learning and representing intricate patterns in videos.

How do I use Google Genie AI App?

Google Genie AI, which is a 2D platformer games based on images, is not currently available for the public use. It is still under development and a research project within Google DeepMind. It is not readily available for general use purposes at this stage.

It is expected to launch somewhere in 2024, but is not confirmed as of now. We will bring you the update regarding this.

Google Genie vs OpenAI Sora

What is OpenAI Sora? Talking about OpenAI Sora, it is a text-to-video generative AI model for creating realistic and imaginative videos from text prompts and images. It is also currently under development and is not available for general public use.

Although, not much known about these two AI models or AI chatbots in terms of working, performance, efficiency and quality. We tried to identify few key differences between these two. These are:

FeaturesGoogle Genie AIOpenAI Sora
FocusGenerating interactive 2D platformer video gamesGenerating high-fidelity video content
User InteractionUsers can manipulate the generated world frame-by-framePassive viewing experience, no direct user interaction
Output FormatGenerate functional and playable 2D game environmentsGenerate realistic video sequences and videos with varying lengths and resolutions
Training DataTrained with 200,000 hours of 2D platformer gameplay videosTrained with diverse dataset of images and videos
Current StatusResearch project, not publicly availableUnder development, limited public access through beta program
Genie vs Sora


Google Genie is an amazing Generative AI model that creates video games and playable interactive virtual environments. Even though it’s still new, Genie AI shows how powerful AI can be in being creative. It closes the gap between what we imagine and what we play, suggesting a future where sharing your game is as easy as sharing a photo.

The special thing about Genie is that it can learn and mimic actions, controls for game characters and objects just from learning by watching internet videos.

But, there are important challenges to tackle. Right now, Genie is great at simple 2D games or 2D platformers, but making it work well with more complicated 3D games is tricky. Also, the games it makes have pretty basic controls.

In the future, researchers might have to focus on making more precise controls, details and implementing complex game functionalities.

Rise of such No-Code Platforms also raises questions about the future of jobs for video game developers and concerns about game marketplaces, which already have a lot of low-quality games and reused assets.

Technical Contributor
Follow me

Leave a Comment