Reassessing Quality Assessment — The Flawed System for Fixing a Flawed System

“Berwick recognized that blaming workers for factors beyond their control quashes goodwill and encourages cheating. This insight accords with a foundational principle of the QI [quality-improvement] movement: most quality lapses reflect a faulty system rather than faulty people. To improve quality, we must fix the system.

Some 30 years later, however, the fix is itself a massive system. As reimbursement models shift toward value-based payment, QI is no longer just about being better, but about documenting improvement to maximize payment. An entire industry has arisen to support the optimization and demonstration of performance. Though I could find no authoritative estimate of U.S. investments in QI infrastructure, the Centers for Medicare and Medicaid Services (CMS) spent about $1.3 billion on measure development and maintenance between 2008 and 2018. Hospitals’ QI investments vary with their size, but data from the National Academy of Medicine suggest that health systems each employ 50 to 100 people for $3.5 million to $12 million per year to support measurement efforts. Small practices bear the greatest relative costs. One 2016 study found that practices spend about $40,000 per physician per year to meet quality-documentation requirements, for an estimated total of $15.4 billion per year.

Financial costs aside, if good care is the goal, the greatest cost of all this activity may be wasted time. The 2016 study found that the average physician spent 2.6 hours per week on QI documentation; another recent study that examined CMS’s Merit-Based Incentive Payment System (MIPS) for ambulatory care settings found that clinicians and administrators invested about 200 hours per year to meet each physician’s MIPS requirements. That these hours could be spent in countless other ways — especially caring for patients — raises an obvious question: Is the system we created to fix the system even working?

Is Quality Improving

It’s hard to know. Some early efforts — such as those focused on reducing nosocomial infections, improving surgical outcomes, and improving processes of care for patients with pneumonia, heart failure, or myocardial infarction — succeeded. But recently, there has been growing recognition of the QI movement’s shortcomings. One study, for instance, showed that only 37% of MIPS measures for ambulatory internal medicine were valid, and even CMS and the Government Accountability Office have acknowledged the need to improve the quality of measuring quality.

One unforeseen challenge is that a measure, no matter how medically sound or well intentioned, is never just a measure. For instance, few physicians would object to the need to check glycated hemoglobin levels in patients with diabetes. But once a measure is implemented and tied to a financial incentive, an entire industry arises to boost organizations’ scores on that measure. Consultants get hired. Electronic health records (EHRs) are changed. And the measures become a source of intense organizational focus. Not only does it become difficult to modify measures that aren’t clearly working, but a tremendous amount of resources are directed toward the appearance of quality rather than its substance.

That QI has become a costly distraction was, ironically, best crystallized by CMS when, early in the Covid-19 pandemic, it announced it was suspending or delaying quality-reporting requirements so that, as CMS Administrator Seema Verma said, “the healthcare delivery system can direct its time and resources toward caring for patients.” Many physicians’ response was essentially “Seriously? Why isn’t the essence of quality devoting our time and resources to caring for patients all the time?” [..]

The methodologic challenge applies to determining how best to improve quality and how best to evaluate our success. “If you asked me whether quality has improved in the last 50 years relative to what it ought to be, we don’t know the answer,” says Robert Brook, a quality-measurement expert at RAND and the University of California, Los Angeles. Noting that we don’t really know how many preventable deaths are attributable to poor medical care, nor whether the use of unnecessary medical services is decreasing, Brook stressed that he and others simply assume QI efforts are inherently good. “I believe all this activity does good,” he told me, “but it’s a belief, like whether I believe in God.” Yet the challenge of QI is an empirical one. “Nobody’s really asking these questions to put it all together,” Brook said. “We never really developed an epidemiology of quality.” [..]

Unfortunately, the science of quality measurement has become untethered from these philosophical underpinnings. Despite innumerable metrics and vast research assessing their worth, it’s still not clear that we’re measuring what matters nor whether we have the methods to figure it out. Compounding this complexity, the incentives now tied to QI may variably affect outcomes, thus warranting their own examination but also further distancing us from epistemological questions about quality’s meaning. Any consideration of whether the movement’s costs are justified by its benefits, then, must escape the tautological trap that assumes quality is improving if we’re scoring better on what we measure.

Is Paying for Performance Bad for Quality?

[..] three key points. The first may be obvious but is easily lost in the grumbling: doctors want the best care for their patients. Critics of QI initiatives aren’t arguing that managing hypertension isn’t important; they object to the way these goals are operationalized, particularly as they are tied to financial incentives and therefore receive disproportionate focus. Second, as quality is increasingly linked to reimbursement, the documentation burden imposed by billing requirements will become inextricable from the demands of demonstrating quality. Finally, using internal performance standards to motivate better care — which many physicians embrace — differs starkly from using external financial incentives to improve quality. A pressing question, then, is whether value-based payment designs improve quality or reduce costs.

My overwhelming sense is that, on balance, they don’t. It’s difficult to know for sure because payment models’ incentive structures vary, as do practice settings and the outcomes observed. But growing evidence provides a rough sketch. One analysis assessing CMS’s Hospital Value-Based Purchasing initiative found only one quality benefit: reduced pneumonia-specific mortality. A study examining CMS’s Hospital Readmissions Reduction Program (HRRP) actually found an increase in 30-day mortality among patients hospitalized for heart failure or pneumonia, driven by patients who were not readmitted (which raises concerns that sicker patients were being turned away). Although there’s debate about whether the HRRP causes harm, there’s little evidence to suggest consistent benefit. [..]

Are We Even Measuring Quality?

When it comes to preference-sensitive decisions like taking statins for primary prevention or undergoing mammography, [primary care physician who directs performance improvement for Blue Cross Blue Shield of Massachusetts Mark] Friedberg suggests, rate-based metrics distract from what really matters: the quality of decision making and whether the patient was engaged and informed.

Friedberg recalled a meeting about a decade ago in which a physician shared an anecdote about strong-arming a deeply reluctant woman into getting a mammogram, allowing him to get a perfect quality score in his practice. As the audience applauded the excellent performance, Friedberg was horrified that even though the patient’s agency had been completely disregarded, most people in the room seemed to have blindly accepted that the metrics represented good care. Though Friedberg believes in using measurement to improve quality, it drives him crazy when the saying “Don’t let perfect be the enemy of good” is invoked to justify QI measures that omit consideration of ethics. “Have we even decided this measure is good?” he frequently wonders. [..]

Inequity among Patients, Demoralization among Physicians

Some early QI leaders recognized the risk of demoralizing the workforce. Berwick recalls an early project that involved conducting patient-satisfaction surveys for a medical practice. When he distributed physicians’ individual reports during a department meeting, one excellent internist crushed hers into a ball and threw it in his face. “Oh my goodness,” he thought, “we are measuring and measuring, using words like ‘accountability,’ ‘incentives,’ ‘rewards and punishments,’ but this has little to do with how things get better.” With this approach now entrenched in P4P initiatives, Berwick observed, “there is magical thinking, that if we just measure enough, and attach those measures to incentives, a miracle happens.”

Though Berwick has been sounding this alarm for decades, as we shift toward value-based payment, invest heavily in an improvement infrastructure that isn’t clearly working, and navigate an epidemic of burnout and workforce demoralization, we seem to inch farther away from the ideals he’s long encouraged. Why has it proven so difficult to heed his advice?”

Full article, L Rosenbaum, New England Journal of Medicine 2022.4.13 [This is the first article in the journal’s series: Medicine and Society – The Quality Movement. Rosenbaum’s next article is summarized here.]