breakthrough high confidence

Harvard/Science Study: OpenAI o1 Outperforms Emergency Room Physicians in Diagnosis — 67% vs 56% Accuracy

April 30, 2026 | AI for Good

A landmark peer-reviewed study published in Science by Harvard Medical School, Beth Israel Deaconess Medical Center, and Stanford collaborators found that OpenAI's o1 reasoning model correctly identified an exact or very close diagnosis in 67% of real emergency room triage cases — more than 10 percentage points higher than two internal medicine attending physicians given identical text-based patient data. In one clinical reasoning task, o1 achieved a perfect score on 98% of cases versus 35% for physicians. The study used 76 real ER patients and is among the most rigorous head-to-head comparisons of LLM diagnostic reasoning and physician performance to date. Authors explicitly called for urgent prospective randomized trials before clinical deployment, noting that the controlled text-only format of the study does not replicate the full complexity of clinical encounters — including physical examination, emotional context, and physician-patient communication. The publication in Science (the world's second most-cited journal) and its Harvard provenance give the findings exceptional credibility and are expected to accelerate regulatory debate about AI clinical deployment standards.

Media

📄 Read article

Science peer-reviewed study: OpenAI o1 outperforms ER physicians in clinical diagnosis — Harvard/Beth Israel — Science / AAAS

Sources

T1 Science / AAAS Official western
T2 STAT News Major western
T2 Science News Major western

View full AI for Good dashboard