Artificial intelligence is ready to collaborate. Why fixate on automation?
“We should insist on AI that can collaborate with, say, doctors—as well as teachers, lawyers, building contractors, and many others—instead of AI that aims to automate them out of a job.
Radiology provides an illustrative example of automation overreach. In a widely discussed study published in April 2024, researchers at MIT found that when radiologists used an AI diagnostic tool called CheXpert, the accuracy of their diagnoses declined. “Even though the AI tool in our experiment performs better than two-thirds of radiologists,” the researchers wrote, “we find that giving radiologists access to AI predictions does not, on average, lead to higher performance.” Why did this good tool produce bad results?
A proximate answer is that doctors didn’t know when to defer to the AI’s judgment and when to rely on their own expertise. When AI offered confident predictions, doctors frequently overrode those predictions with their own. When AI offered uncertain predictions, doctors frequently overrode their own better predictions with those supplied by the machine. Because the tool offered little transparency, radiologists had no way to discern when they should trust it.
A deeper problem is that this tool was designed to automate the task of diagnostic radiology: to read scans like a radiologist. But automating a radiologist’s entire diagnostic job was infeasible because CheXpert was not equipped to process the ancillary medical histories, conversations, and diagnostic data that radiologists rely on for interpreting scans. Given the differing capabilities of doctors and CheXpert, there was potential for virtuous collaboration. But CheXpert wasn’t designed for this kind of collaboration.
When experts collaborate, they communicate. If two clinicians disagree on a diagnosis, they might isolate the root of the disagreement through discussion (e.g., “You’re overlooking this.”). Or they might arrive at a third diagnosis that neither had been considering. That’s the power of collaboration, but it cannot happen with systems that aren’t built to listen. Where CheXpert’s and the radiologist’s assessments differed, the doctor was left with a binary choice: go with the software’s statistical best guess or go with her own expert judgment. [..]
Tools can be generally divided into two main buckets: In one bucket, you’ll find automation tools that function as closed systems that do their work without oversight—ATMs, dishwashers, electronic toll takers, and automatic transmissions all fall into this category. These tools replace human expertise in their designated functions, often performing those functions better, cheaper, and faster than humans can. Your car, if you have one, probably shifts gears automatically. Most new drivers today will never have to master a stick shift and clutch.
In the second bucket you’ll find collaboration tools, such as chain saws, word processors, and stethoscopes. Unlike automation tools, collaboration tools require human engagement. They are force multipliers for human capabilities, but only if the user supplies the relevant expertise. A stethoscope is unhelpful to a layperson. A chainsaw is invaluable to some, dangerous to many.
[..] bad automation tools—machines that attempt but fail to fully automate a task—also make bad collaboration tools. They don’t merely fall short of their promise to replace human expertise at higher performance or lower cost, they interfere with human expertise, and sometimes undermine it.
The promise of automation is that the relevant expertise is no longer required from the human operator because the capability is now built-in. (And to be clear, automation does not always imply superior performance—consider self-checkout lines and computerized airline phone agents.) But if the human operator’s expertise must serve as a fail-safe to prevent catastrophe—guarding against edge cases or grabbing the controls if something breaks—then automation is failing to deliver on its promise. The need for a fail-safe can be intrinsic to the AI, or caused by an external failure—either way, the consequences of that failure can be grave. [..]
Collaboration is not intrinsically better than automation. It would be ridiculous to collaborate with your car’s transmission or to pilot your office elevator from floor to floor. But in some domains, occupations, or tasks where full automation is not currently achievable, where human expertise remains indispensable or a necessary fail-safe, tools should be designed to collaborate—to amplify human expertise, not to keep it on ice until the last possible moment.
One thing that our tools have not historically done for us is make expert decisions. Expert decisions are high-stakes, one-off choices where the single right answer is not clear—often not knowable—but the quality of the decision matters. There is no single best way, for example, to care for a cancer patient, write a legal brief, remodel a kitchen, or develop a lesson plan. But the skill, judgment, and ingenuity of human decision making determines outcomes in many of these tasks, sometimes dramatically so. Making the right call means exercising expert judgment, which means more than just following the rules. Expert judgment is needed precisely where the rules are not enough, where creativity, ingenuity, and educated guesses are essential.
But we should not be too impressed by expertise: Even the best experts are fallible, inconsistent, and expensive. Patients receiving surgery on Fridays fare worse than those treated on other days of the week, and standardized test takers are more likely to flub equally easy questions if they appear later on a test. Of course, most experts are far from the best in their fields. And experts of all skill levels may be unevenly distributed or simply unavailable—a shortage that is more acute in less affluent communities and lower-income countries. [..]
The inescapable fact that human expertise is scarce, imperfect, and perishable makes the advent of ubiquitous AI an unprecedented opportunity. AI is the first machine humanity has devised that can make high-stakes, one-off expert decisions at scale—in diagnosing patients, developing lesson plans, redesigning kitchens. AI’s capabilities in this regard, while not perfect, have consistently been improving year by year. [..]
The question is not whether AI can do things that experts cannot do on their own—it can. Yet expert humans often bring something that today’s AI models cannot: situational context, tacit knowledge, ethical intuition, emotional intelligence, and the ability to weigh consequences that fall outside the data. Putting the two together typically amplifies human expertise: Oncologists can ask a model to flag every recorded case of a rare mutation and then apply clinical judgment to design a bespoke treatment; a software architect can have the model retrieve dozens of edge-case vulnerabilities and then decide which security patch best fits the company’s needs. The value is not in substituting one expert for another, or in outsourcing fully to the machine, or indeed in presuming the human expertise will always be superior, but in leveraging human and rapidly-evolving machine capabilities to achieve best results.
[..] while experts use chatbots as collaboration tools—riffing on ideas, clarifying intuitions—novices often treat them mistakenly as automation tools, oracles that speak from a bottomless well of knowledge. That becomes a problem when an AI chatbot confidently provides information that is misleading, speculative, or simply false. Because current AIs don’t understand what they don’t understand, those lacking the expertise to identify flawed reasoning and outright errors may be led astray.
The seduction of cognitive automation helps explain a worrying pattern: AI tools can boost the productivity of experts but may also actively mislead novices in expertise-heavy fields such as legal services. Novices struggle to spot inaccuracies and lack efficient methods for validating AI outputs. And methodically fact-checking every AI suggestion can negate any time savings.
Beyond the risk of errors, there is some early evidence that overreliance on AI can impede the development of critical thinking, or inhibit learning. Studies suggest a negative correlation between frequent AI use and critical-thinking skills, likely due to increased “cognitive offloading”—letting the AI do the thinking. In high-stakes environments, this tendency toward overreliance is particularly dangerous: Users may accept incorrect AI suggestions, especially if delivered with apparent confidence.
[..] In a PNAS study published earlier this year and covering 2,133 “mystery” medical cases, researchers ran three head-to-head trials: doctors diagnosing on their own, five leading AI models diagnosing on their own, and then doctors reviewing the AI suggestions before giving a final answer. That human-plus-AI pair proved most accurate, correct on roughly 85 percent more cases than physicians working solo and 15 to 20 percent more than an AI alone. The gain came from complementary strengths: When the model missed a clue, the clinician usually spotted it, and when the clinician slipped, the model filled the gap. The researchers engineered human-AI complementarity into the design of the trials, and saw results. As these tools evolve, we believe they will surely take on autonomous diagnostic tasks, such as triaging patients and ordering further testing—and may indeed do better over time on their own, as some early studies suggest.
Or, consider an example with which one of us is closely familiar: Google’s Articulate Medical Intelligence Explorer (AMIE) is an AI system built to assist physicians. AMIE conducts multi-turn chats that mirror a real primary-care visit: It asks follow-up questions when it is unsure, explains its reasoning, and adjusts its line of inquiry as new information emerges. In a blinded study recently published in Nature, specialist physicians compared the performance of a primary-care doctor working alone with that of a doctor who collaborated with AMIE. The doctor who used AMIE ranked higher on 30 of 32 clinical-communication and diagnostic axes, including empathy and clarity of explanations.
By exposing its reasoning, highlighting uncertainty, and grounding advice in trusted sources, AMIE pulls the user into an active problem-solving loop instead of handing down answers from on high. Doctors can potentially interrogate and correct it in real time, reinforcing (rather than eroding) their own diagnostic skills. These results are preliminary: AMIE is still a research prototype and not a drop-in replacement. But its design principles suggest a path toward meaningful human collaboration with AI.
[..] Should we go all in on automation? Should we build collaborative AI that learns from our choices, informs our decisions, and partners with us to drive better results? The correct answer, of course, is both. Getting this balance right across capabilities is a formidable and ever-evolving challenge. Fortunately, the principles and techniques for using AI collaboratively are now emerging.”
Full article, D Autor and J Manyika, The Atlantic, 2025.8.24