Microsoft MAI-DxO AI beats 21 doctors in medical diagnosis test, scores 85.5% accuracy

New Delhi: In a recent benchmark test, Microsoft’s new AI-powered medical assistant has surprised even some of the most experienced doctors. Called the Microsoft AI Diagnostic Orchestrator, or MAI-DxO, the system managed to solve complex medical cases with a success rate that’s more than four times higher than that of human physicians.

The AI tool was pitted against 21 experienced doctors from the US and UK. Both sides were given 304 patient cases published by the New England Journal of Medicine, known to be among the most difficult and puzzling records in clinical medicine. While the doctors got about 20 percent of the diagnoses right, MAI-DxO clocked an impressive 85.5 percent accuracy.

A panel of doctors, powered by GPT and more

What’s interesting is how the system works. Microsoft didn’t just rely on one AI model. They ran tests with multiple foundation models, including OpenAI’s GPT, Google’s Gemini, Meta’s Llama, Anthropic’s Claude, X’s Grok, and DeepSeek. But the best results came when MAI-DxO worked with OpenAI’s latest o3 model.

Instead of answering everything in one go like a quiz, the AI performs a “sequential diagnosis.” This means it reads symptoms, asks follow-up questions, recommends tests, and then tries to zero in on the right illness, just like a real doctor would in a clinic. This style avoids rushing to conclusions and adds a layer of real-time reasoning.

Microsoft AI VP of Health, Bay Gross, called it “a proof-of-concept showing that large language model systems can master medicine’s most intricate diagnostic challenges by following the same step-by-step reasoning and debate process that expert physicians use every day.”

AI is faster, cheaper, and sometimes smarter

The Microsoft team added cost controls to MAI-DxO so that it wouldn’t go overboard with medical tests. That helped it save costs and cut down on unnecessary diagnostics. In fact, the AI turned out to be more cost-effective than both human doctors and individual AI models.

According to the company’s blog, this was meant to reflect how real-life hospitals operate under tight budgets and time limits. “Without such constraints, an AI system might otherwise default to ordering every possible test,” the post said.

Microsoft’s CEO of AI, Mustafa Suleyman, summed it up on LinkedIn: “Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs.”

Not ready for hospitals yet

The tool is not yet available for public or clinical use. Microsoft says this is just the first step. More trials are needed, especially for everyday conditions that aren’t as complex as the ones in the NEJM case series.

The AI was tested against doctors working solo without support from peers, books, or web tools. Microsoft said this was done to maintain fairness in comparing raw performance.

There’s also no peer-reviewed journal paper yet, but the team has shared a preprint and video demonstration for those curious.