Maginative: Microsoft’s MAI-DxO Crushes Doctors at Medical Diagnosis while Cutting Costs

Maginative reports on Microsoft’s new AI Diagnostic Orchestrator and how it outperformed doctors in a recent study. (As an aside, I always wonder about reports that use words like crush in the title. Beware of hyperbole!)

From the report’s abstract, you’ll find exciting results:

When paired with OpenAI’s o3 model, MAI-DxO achieves 80% diagnostic accuracy—four times higher than the 20% average of generalist physicians. MAI-DxO also reduces diagnostic costs by 20% compared to physicians, and 70% compared to off-the-shelf o3. 

A 4x improvement in diagnostic accuracy. This is transformative stuff.

But when considering the experimental setup:

Physicians were explicitly instructed not to use external resources, including search engines (e.g., Google, Bing), language models (e.g., ChatGPT, Gemini, Copilot, etc), or other online sources of medical information.

Now the results don’t seem quite so impressive. In fact, I have a hard time understanding how this report has much utility due to these extreme restrictions that don’t align with real-world practices.