跳到正文

在哈佛大学的研究中,人工智能提供的诊断比急诊室医生更准确| TechCrunch

In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors

TechCrunch AI 国际资讯 快讯 3 分钟阅读 470 字 归档:2026年5月4日 02:06 查看原文 →
模型
In Harvard study, AI offered more accurate diagnoses than emergency room doctors | TechCrunch

导读

一项新的研究考察了大型语言模型在各种医疗环境中的表现,包括真实的急诊室病例—其中至少有一个模型似乎比人类医生更准确。

A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.

原文快照

站内保留一份可阅读的正文副本;如抓取失败,则保留摘要和原文链接。

A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.

The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians.

In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two attending physicians to those generated by OpenAI’s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.

“At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,” the study said, adding that the differences “were especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.”

In Harvard Medical School’s press release about the study, the researchers emphasized that they did not “pre-process the data at all” — the AI models were presented with the same information that was available in the electronic medical records at the time of each diagnosis.

With that information, the o1 model managed to offer “the exact or very close diagnosis” in 67% of triage cases, compared to one physician who had the exact or close diagnosis 55% of the time, and to the other who hit the mark 50% of the time.

“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” said Arjun Manrai, who heads an AI lab at Harvard Medical School and is one of the study’s lead authors, in the press release.

Meet your next investor or portfolio startup at Disrupt

To be clear, the study didn’t claim that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show an “urgent need for prospective trials to evaluate these technologies in real-world patient care settings.”

The researchers also noted that they only studied how models performed when provided with text-based information, and that “existing studies suggest that current foundation models are more limited in reasoning over nontext inputs.”

Adam Rodman, a Beth Israel doctor who’s also one of the study’s lead authors, told the Guardian that there’s “no formal framework right now for accountability” around AI diagnoses, and that patients still “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions”.