New Delhi: Microsoft has unveiled its first fully in-house image generation model called MAI-Image-1, marking its formal entry into the visual AI race dominated by Google’s Gemini-2.5 Flash “Nano Banana” and OpenAI’s GPT-Image-1. The company said the model, now available for public testing on LMArena, will soon roll out to Copilot and Bing Image Creator.
The new model secured the #9 spot on the global text-to-image leaderboard at LMArena, a platform where users compare anonymous AI models and vote for the best outputs. This debut positions Microsoft among the top 10 image generation systems globally, a notable entry for a model built entirely in-house.
Microsoft’s first homegrown image model
MAI-Image-1 was trained by Microsoft’s internal AI team with a focus on creative precision and visual diversity, rather than flashy or repetitive styles. “We prioritised rigorous data selection and nuanced evaluation focused on tasks that closely mirror real-world creative use cases,” Microsoft said in its announcement.
The company worked closely with professionals in creative industries while training and testing the model. The goal, according to the team, was to ensure the tool felt useful to designers, photographers, and digital artists who rely on AI tools for visual ideation.
Microsoft said the model “excels at generating photorealistic imagery,” especially in areas like lighting, reflections, and landscapes. It is said to produce high-quality results faster than many larger and slower models.
Competing with Google and OpenAI
Microsoft’s new model enters an increasingly competitive field. On the LMArena leaderboard, Google’s Gemini-2.5 Flash (Nano Banana) currently holds the #2 spot with 1154 points, followed by OpenAI’s GPT-Image-1 at #7 with 1123 points. Microsoft’s MAI-Image-1 scored 1096 points, placing it just behind the leaders. The top rank is held by Hunyuan-image-3.0, a model developed by Chinese tech firm Hunyuan.
The competition isn’t just about rankings. Each of these models represents a slightly different philosophy of AI creativity. OpenAI’s model went viral for recreating Studio Ghibli’s dreamy art style, while Google’s “Nano Banana” made headlines for its powerful AI image editing features that blend natural elements seamlessly into digital artwork.
Microsoft’s approach, by contrast, is grounded in speed, realism, and creator flexibility. The company said its model allows users to get ideas on-screen faster and iterate through versions quickly before refining them in other tools.
Microsoft’s growing AI ecosystem
MAI-Image-1 is part of a growing line of homegrown AI systems at Microsoft, which also includes MAI-Voice-1, a speech generation model, and the Phi series of small language models known for efficient reasoning. These models operate alongside Microsoft’s continued collaboration with OpenAI, which relies heavily on Microsoft’s Azure infrastructure.
Microsoft said the model will soon be integrated into Copilot and Bing Image Creator, two of its most widely used AI tools, to reach millions of creators globally. The company added that it is testing MAI-Image-1 in LMArena “to gather insights and feedback” from users before full-scale deployment.
Nano Banana vs MAI-Image-1
We gave a prompt to Gemini’s Nano-Banana and Microsoft’s new MAI-Image-1, : “Generate an image of a ginger persian male cat sitting on the hood of a red Suzuki Jimny.” Google’s Nano-Banana finished much faster; Google’s image has the older Jimny, meanwhile Microsoft’s model has the newer version.

Google’s Nano-Banana vs Microsoft’s MAI-Image-1 | Source: News9live
The debut of MAI-Image-1 signals that Microsoft is no longer just funding OpenAI but is now actively developing its own creative AI models to compete directly with the likes of Google, OpenAI, and Hunyuan in the rapidly evolving image generation market.