AI beats doctors in major medical tests, but there is a catch

Two advanced AI medical systems have outperformed doctors in key diagnosis and treatment-planning tests, according to new research. While the results are promising, researchers say the tools are still not ready for use with real patients.

AI beats doctors in major medical tests, but there is a catch. (Image is generated using AI for representational purposes)

Ankita Garg

New Delhi,UPDATED: Jun 18, 2026 06:46 IST

AI has cleared another important hurdle in healthcare. Two newly developed medical AI systems have demonstrated the ability to match or even outperform doctors in several diagnostic and treatment-related tasks, according to separate studies published in Nature. While the results point to the growing potential of AI in medicine, researchers say the technology still has a long way to go before it can be trusted with real patients.

One of the AI systems, called Mira, was created by a team of researchers in Germany. The other, Amie, was developed by Google and is powered by its Gemini AI model. Both tools were tested against medical professionals using simulated patient scenarios, and in many cases delivered results that were equal to or better than those produced by doctors, according to a report from Financial Times.

The findings add to a growing body of evidence suggesting that healthcare-focused AI models may offer more reliable medical support than general-purpose consumer chatbots. However, scientists involved in the work stressed that success in controlled experiments does not automatically translate into success inside busy hospitals and clinics.

“We are getting a preview of how AI could transform medicine,” said Jakob Kather, a researcher from TUD Dresden University of Technology and Heidelberg University who helped develop Mira.

Kather compared AI assistants to the autopilot systems used in aircraft, explaining that such tools could help reduce the workload of healthcare professionals while leaving final decisions in human hands.

AI scores higher in diagnosis tests

Mira was designed to work with electronic health records and can perform a wide range of actions, from recommending tests and medicines to arranging procedures. According to the researchers, the system has access to more than 85,000 possible clinical actions.

To assess its performance, the team used information from over 500 emergency department cases. Rather than interacting with real patients, Mira received details through conversations with AI agents that simulated patient behaviour.

Across eight medical conditions, including pancreatic cancer, pneumonia, appendicitis and pulmonary embolism, Mira achieved a diagnostic accuracy rate of 87.1 percent. A group of six physicians from different medical specialties recorded 78.1 percent under the same testing conditions.

Google's Amie performs strongly in treatment planning

Google's Amie was evaluated in a different way. Researchers created 100 patient scenarios based on UK healthcare guidelines and used actors to role-play patients during text-based consultations. The AI system was then compared with 21 primary care physicians.

The study found that Amie performed on par with doctors when reasoning through patient management decisions. In several cases, it produced treatment and investigation plans that were more closely aligned with clinical guidelines. Researchers also found that the AI handled medication-related decisions particularly well in complex cases.

Despite the encouraging results, scientists behind both projects acknowledged several limitations.

The team behind Mira said the system occasionally recommended care that did not fully align with accepted medical practice. Researchers also noted that the information supplied by simulated patients was likely more organised and complete than what doctors encounter in real emergency settings.

Google's researchers expressed similar concerns. They described the study as an important step forward but said the testing environment did not capture the unpredictability and complexity of real-world healthcare. According to the team, Amie still requires additional work to reduce reasoning mistakes and improve consistency before it can be considered for practical deployment.

Independent experts welcomed the studies but agreed that caution is needed when interpreting the results.

Catherine Pope, Professor of Medical Sociology at the University of Oxford, said real healthcare environments are far messier than carefully designed simulations. Patients often provide incomplete information, change their descriptions of symptoms or present multiple health issues at the same time.

Julie Jacko, Professor of Health Informatics and Data Science at the University of Edinburgh, said the studies showed that AI could create detailed and comprehensive care plans. However, she noted that this did not necessarily mean the systems possessed better clinical judgement than experienced doctors.

Researchers also pointed out that some of Amie's strong performance may be linked to rapid improvements in modern AI models generally, rather than innovations unique to the healthcare-focused system itself.

- Ends

Published By:

Ankita Garg

Published On:

Jun 18, 2026 06:38 IST