If A.I. Can Diagnose Patients, What Are Doctors For?

Large language models are transforming medicine—but the technology comes with side effects.

“Surveys have suggested that many people are more confident in A.I. diagnoses than in those rendered by professionals. Meanwhile, in the United States alone, misdiagnosis disables hundreds of thousands of people each year; autopsy studies suggest that it contributes to perhaps one in every ten deaths. [..]

One recent study found that OpenAI’s GPT-4 answered open-ended medical questions incorrectly about two-thirds of the time. In another, GPT-3.5 misdiagnosed more than eighty per cent of complex pediatric cases. Meanwhile, leading large language models have become much less likely to include disclaimers in their responses. One analysis found that, in 2022, more than a quarter of responses to health-related queries included something like “I am not qualified to give medical advice.” This year, only one per cent did. In a new survey, about a fifth of Americans said that they’ve taken medical advice from A.I. that later proved to be incorrect. Earlier this year, a poison-control center in Arizona reported a drop in total call volume but a rise in severely poisoned patients. The center’s director suggested that A.I. tools might have steered people away from medical attention. Chatbots also create serious privacy concerns: once your medical information enters the chat, it no longer belongs to you. Last year, Elon Musk encouraged users of X to upload their medical images to Grok, the platform’s A.I., for “analysis.” The company was later found to have made hundreds of thousands of chat transcripts accessible to search engines, often without permission. [..]

Last year, he [Beth Israel Deaconess Medical Center physician, leader of Harvard’s efforts to integrate generative AI into its medical school curriculum and collaborator for a CaBot, an advanced “reasoning model” from OpenAI tailored to making difficult diagnoses Adam Rodman] co-authored a study in which some doctors solved cases with help from ChatGPT. They performed no better than doctors who didn’t use the chatbot. The chatbot alone, however, solved the cases more accurately than the humans. In a follow-up study, Rodman’s team suggested specific ways of using A.I.: they asked some doctors to read the A.I.’s opinion before they analyzed cases, and told others to give A.I. their working diagnosis and ask for a second opinion. This time, both groups diagnosed patients more accurately than humans alone did. The first group proved faster and more effective at proposing next steps. When the chatbot went second, however, it frequently “disobeyed” an instruction to ignore what the doctors had concluded. It seemed to cheat, by anchoring its analysis to the doctor’s existing diagnosis. [..]

“I went to medical school to become a real, capital-‘D’ doctor,” he [University of Texas Southwesten medical student Benjamin Popokh] told me. “If all you do is plug symptoms into an A.I., are you still a doctor, or are you just slightly better at prompting A.I. than your patients?” [..]

A.I. models can sound like Ph.D.s, even while making grade-school errors in judgment. Chatbots can’t examine patients, and they’re known to struggle with open-ended queries. Their output gets better when you emphasize what’s most important, but most people aren’t trained to sort symptoms in that way. A person with chest pain might be experiencing acid reflux, inflammation, or a heart attack; a doctor would ask whether the pain happens when they eat, when they walk, or when they’re lying in bed. If the person leans forward, does the pain worsen or lessen? Sometimes we listen for phrases that dramatically increase the odds of a particular condition. “Worst headache of my life” may mean brain hemorrhage; “curtain over my eye” suggests a retinal-artery blockage. The difference between A.I. and earlier diagnostic technologies is like the difference between a power saw and a hacksaw. But a user who’s not careful could cut off a finger.

Attend enough clinicopathological conferences, or watch enough episodes of “House,” and every medical case starts to sound like a mystery to be solved. Lisa Sanders, the doctor at the center of the Times Magazine column and Netflix series “Diagnosis,” has compared her work to that of Sherlock Holmes. But the daily practice of medicine is often far more routine and repetitive. On a rotation at a V.A. hospital during my training, for example, I felt less like Sherlock than like Sisyphus. Virtually every patient, it seemed, presented with some combination of emphysema, heart failure, diabetes, chronic kidney disease, and high blood pressure. I became acquainted with a new phrase—“likely multifactorial,” which meant that there were several explanations for what the patient was experiencing—and I looked for ways to address one condition without exacerbating another. [..]

Tasking an A.I. with solving a medical case makes the mistake of “starting with the end,” according to Gurpreet Dhaliwal, a physician at the University of California, San Francisco, whom the Times once described as “one of the most skillful clinical diagnosticians in practice.” In Dhaliwal’s view, doctors are better off asking A.I. for help with “wayfinding”: instead of asking what sickened a patient, a doctor could ask a model to identify trends in the patient’s trajectory, along with important details that the doctor might have missed. The model would not give the doctor orders to follow; instead, it might alert her to a recent study, propose a helpful blood test, or unearth a lab result in a decades-old medical record. Dhaliwal’s vision for medical A.I. recognizes the difference between diagnosing people and competently caring for them. “Just because you have a Japanese-English dictionary in your desk doesn’t mean you’re fluent in Japanese,” he told me.

[..] If A.I. tools continue to misdiagnose and hallucinate, we might not want them to diagnose us at all. Yet we could ask them to rate the urgency of our symptoms, and to list the range of conditions that could explain them, with some sense of which ones are most likely. A patient could inquire about “red-flag symptoms”—warning signs that would indicate a more serious condition—and about which trusted sources the A.I. is drawing on. A chatbot that gets details wrong could still help you consider what to ask at your next appointment. And it could aid you in decoding your doctor’s advice.

[..] Patients and doctors alike could think of A.I. not as a way to solve mysteries but as a way to gather clues. An A.I. could argue for and against the elective surgery that you’re considering; it could explain why your physical therapist and your orthopedic surgeon tell different stories about your back pain, and how you might weigh their divergent recommendations. In this role, chatbots would become a means of exploration: a place to start, not a place to end. At their best, they would steer you through—not away from—the medical system.”

Full article, D Khullar, The New Yorker, 2025.9.22