Google I/O 2026: Gemini Omni brings advanced multimodal AI video creation

New Delhi: Google has launched a new family of generative AI models called Gemini Omni that has been designed to “create anything from any input”. The new model, unveiled at the company’s I/O 2026 developer event, merges text, image, video and audio inputs to create realistic AI videos that have a greater sense of real-world behaviour, motion and physics.

The launch is Google’s next move in developing its own “world model”, an AI system that can simulate and grasp reality instead of just foretelling text. Google’s Gemini Omni is based on earlier initiatives such as Veo, Genie and Nano Banana and integrates them into a more cohesive multimodal system. Google claims that the first of the family, Gemini Omni Flash, will be rolling out from today for AI Plus users, and the integration with YouTube Shorts will be a bit later this week.

Gemini Omni focuses on realistic AI video generation

Google has announced that currently it is working on video creation with Gemini Omni. Users can use prompts combined with images, voice samples, audio clips and existing video inputs to generate a completely new video based on the AI. Omni reasons over all the media inputs concurrently, unlike traditional AI tools, and gives more consistent output.

At the keynote talk, Google presented a number of demonstrations, such as a video of the marble rolling with believable physics interactions and realistic sound effects. Another demo was a claymation-style explainer of how protein folding is achieved with AI narration.

In a media briefing, Sundar Pichai stated that AI is progressing “from predicting text to simulating reality”. Google DeepMind’s leadership team also referred to Omni as the next evolution of fusing the intelligence of the Gemini models with the rendering capabilities of media generation models.

Google says Gemini Omni understands science, culture and motion

Google promises Omni to produce videos that embody “scientific concepts, historical context, cultural references, and realistic object behaviour”. One of the requests made to the reporters was for a “claymation explainer of protein folding”, which resulted in an educational stop-motion animation with voice-over narration.

The longer-term objective is to extend the generation to broader multimodality, the company says. With further development, future versions might be able to generate images from sound, audio from videos or fully interactive generated experiences.

Nicole Brichtova, Director of Product Management at Google DeepMind, stated that Omni is “more than a Veo update” as it is a “model that uses the multimodal reasoning of Gemini and cutting-edge media rendering capabilities.

Gemini Omni Flash rolls out today

Gemini Omni Flash is rolling out starting today.

Here’s where you can find it:

🔹 Today: Google AI Plus, Pro and Ultra subscribers globally in the @GeminiApp and @FlowbyGoogle .

🔹Rolling out starting this week, for no cost: @YouTube Shorts and the YouTube Create app.… pic.twitter.com/07lAavqy2G

— Google (@Google) May 19, 2026

Google announced that the first live model will be Gemini Omni Flash. The model is currently able to create videos of up to 10 seconds in length. Google says the shorter time is a product decision, rather than a technical limitation, because it believes most users will start making smaller clips for social media and entertainment.

Gemini Omni Flash is being launched across the Gemini app, YouTube Shorts and Google’s Flow AI creative studio. Google is also going to give API access to developers and enterprises in the upcoming weeks.

The more advanced version, Omni Pro, has also been teased. The higher-end version will be available when Google claims it offers a “step change” in performance over Flash, it has said.

AI avatars and deepfake safeguards included

Google is also making AI-powered personal avatars accessible via Gemini Omni. After undergoing a verification process which involves recording facial expressions and saying random numbers aloud, users will be able to produce digital copies of themselves for videos.

The company states that these protections are intended to assist to counteract misuse and impersonation via deepfakes. Moreover, each video produced with Gemini Omni will be equipped with Google’s SynthID watermarking technology, allowing users to spot AI-generated videos.

Google confirmed that speech editing inside existing videos will remain limited for now until the company believes the feature can be released responsibly.

Google targets creators, advertisers and filmmakers

Google is betting that creators and businesses will have some significant opportunities with Omni Flash, even though the platform is being marketed as a consumer-driven way to make memes, short videos and clips for social media.

The model’s abilities in rendering text were also mentioned by executives, as they may facilitate advertisers to create precise branding elements, slogans and product visuals directly within the AI-created videos.

The new version of Omni Pro is expected to be more useful for the filmmakers, marketers and creative professionals because of its improved performance capabilities in video generation and editing, Google said.