Generative AI’s three possibly insurmountable challenges for health care

“While I’m often a patient who hates the experience of talking to my doctor through a laptop, I’m also a researcher who has devoted a large part of his career to better understanding what happens when new technologies are added to clinical spaces. From this perspective, I see three major challenges that have to be overcome before large language models can really serve as clinical scribes.

The truth challenge. [..] There are two major chokepoints where hands-free AI could lead to inaccurate medical records. The first is in the speech-to-text technology. When the university where I teach pivoted online for Covid-19, I suddenly found myself pre-recording lectures. The AI transcription systems we used were always making errors, swapping out one word for another. [..]

But whatever is most common may well not be true for any particular patient. This is likely to be an even greater issue for unique cases or rare diseases. There may not be enough relevant information in the data the AI was trained on. In such cases, the language model is likely to make up information that looks true, but isn’t. Ensuring that all notes are true is a huge challenge that limited speech-to-text AIs and current hallucination-prone large language models have yet to overcome.

The time challenge. AI and large language model enthusiasts routinely celebrate the potential of these new technologies to liberate doctors from the drudgeries of modern medicine. They hope that this newfound freedom will result in more time with patients. To be blunt, these folks need to have a serious conversation with the people that own and run most hospitals, because those people seem to have a very different idea about how this will play out.

Many new AI systems are tested and sold on economic and efficiency outcomes. That is, new AI systems are mostly sold based on the extent to which they make care faster or cheaper, not more pleasant for provider and patient. It’s impossible to imagine that in the current economic context of clinical care, LLM [large language model] adoption will lead hospital administrators to support the idea of providers having more time with patients. You can already see this dynamic playing out with old-school human scribes. One 2018 study found that human scribes made it so clinics could squeeze in 8.8% more patients every hour. On the whole, scribe studies focus on economic and consumer satisfaction outcomes rather than health benefits. I can’t see why we should expect research, marketing, and procurement of AI scribes to be any different.

The thought challenge. Research on EHR [electronic health record] use and clinical documentation shows that doctors make better decisions when they read and consult their clinical notes. Having direct access to EHR data supports better clinical decision-making. Looking at the screen, notes, and displays gives providers a chance to think about and synthesize relevant patient information. When doctors no longer engage directly with clinical notes, an active thinking process is replaced with passively waiting for alerts.

But as reliance on alerts increases, alert fatigue sets in as doctors stop paying attention to those alerts. This has already been identified as a serious challenge for expert systems, and one that may become more problematic as LLM-enhanced EHRs roll out. Just as important, some data show that the act of writing notes, however annoying, can also improve clinical thought. Taking the time to write a note forces a doctor to make choices about how they capture the clinical presentation. These are key elements of diagnostic decision-making, elements that ought not to be so casually discarded.”

Full editorial, SS Graham, STAT News, 2023.4.20