Placing the CMS Stars Rating System for Medicare Advantage Plans into a Consumer-Friendly Context

Regardless of how you feel about Medicare Advantage’s (MA) value compared to traditional Medicare with Medicap supplements, MA enrollments continue to rise. The Centers for Medicare & Medicaid Services (CMS) developed the star rating system to measure plans across five dimensions (staying healthy, chronic condition management, member experience, member complaints, and customer service) using four different data sources (health and drug plans, enrollee surveys, data collected by CMS contractors and CMS administrative data). For those MA plans that include drug coverage, the star rating system uses 45 unique measures to track those dimensions as well as monitoring drug safety and drug pricing accuracy. Given the CMS announcement late last week announcing Medicare Advantage plan performance, as well as open enrollment starting October 15, it seemed like a good time to rate the rating system.

The health plan’s overall performance is the average of the plan’s performance across all eligible individual measures, but some measures are weighted more than others. Two metrics are overweighted by a factor of 5 (health plan quality improvement, drug plan quality improvement). The data used to calculate these two specific measures are not publicly available. Five metrics are overweighted by a factor of 3 (improving or maintaining physical health, improving or maintaining mental health, diabetes [blood sugar control], plan all-cause readmissions, medication adherence for diabetes medications, medication adherence for hypertension [RAS agents] and medication adherence for cholesterol [statins]). Nineteen measures across multiple domains are overweighted by a factor of 1.5.

For the two 5X improvement metrics, CMS uses a clustering approach to give MA plans with similar scores the same rating and MA plans with different scores different ratings. This approach requires developing a distance matrix, grouping scores into clusters and selecting final cluster sets. Those plans with one- or two-star performance in the improvement metrics will not have the metric included in the plans’ overall star rating. Plans performing at the four-star level or higher that might have seen a decrease in their overall star rating would also not have the improvement metric(s) included in the overall rating.

The performance ranges within each metric and the clustering approach may be challenging for those of us outside the process to interpret. For example, the medication adherence for diabetes medications metric, one of the 3X metrics, here are the cutoffs for each star rating:

  • One star – <72%
  • Two stars – 72-<78%
  • Three stars – 78-<81%
  • Four stars – 81-<85%
  • Five stars – 85% or higher

Although this approach may help defend CMS from attacks around measurement bias, the process may not correlate to outcomes that matter to patients. Let’s assume a consumer is comparing health plan A, a five-star plan for 85% medication adherence for diabetes medications against plan B, a two-star plan with 75% medication adherence. Can the consumer have any confidence that joining plan A will be associated with a higher quality of life or fewer complications than health plan B?

The rating approach raises several questions. Do all four data sources (and the two improvement metrics) have the degree of accuracy and reliability necessary for a consumer to consider them in a single rating? At least one study suggests that comprehensive medication review, one of the individual metrics, is poorly associated with the 17 medication use and medication management measures also included in the rating system. Is one year of data enough to determine that one plan is superior to another? What about an individual’s preferences around specific measures? This approach weights breast cancer screening the same as medication review for older adults. If I strongly value medication adherence and strongly discount the value of cancer screening, the overall star rating could actually direct me away from the plan that is most closely aligned to my preferences. Finally, do individuals in higher-rated plans have better outcomes than those in lower-rated plans?

So what might an individual consider when selecting a plan given the current method for rating plans and how that rating information is shared with consumers? Medicare’s Plan Compare avoids discussing star ratings altogether. Instead, the website directs the reader to compare out-of-pocket costs one might incur if enrolling in a specific plan. Given my own skepticism around any health plan’s ability to provide a meaningful quality difference in healthcare delivery that would justify an individual to switch plans, I can see this approach’s appeal. There are multiple challenges to developing and deploying a more informative rating system: 1) our current process of developing healthcare quality measures with consensus meetings that include provider organizations is unlikely to produce metrics that reliably identify high- and low-performing providers, 2) even if such a quality measure was developed by a consensus body, payers would lobby CMS to focus on those metrics that are more easily influenced by a health plan, and 3) even if CMS displayed individual metric performance on their website using visually-appealing interfaces that borrowed design approaches from the social media juggernauts, if no one uses the metrics to make healthcare decisions (i.e., switching plans), then the work is a waste.

Since all health plans are working within a fixed set of licensed healthcare providers, hospitals and health systems as well as available medications, it would be difficult for a specific health plan to consistently distinguish their performance against their peers year-after-year. Consumers may be more interested in learning how effective a health plan might be at helping them select a specialist consistent with their values, facilitate access (e.g., individuals in rural areas have to drive hundreds of miles to see a provider, individuals in urban areas have to wait months to be seen), and helping them achieve their health goals (e.g., losing weight, quitting smoking, improving their quality-of-life), metrics that are either currently measured poorly (e.g., metric C22: Getting Needed Care – two survey questions [“In the last 6 months, how often did you get an appointment to see a specialist as soon as you needed?” and “In the last 6 months, how often was it easy to get the care, tests or treatment you needed?”], one-star performance <80%, five-star performance >85%; metric C23: Getting Appointments and Care Quickly – three survey questions [“In the last 6 months, when you needed care right away, how often did you get care as soon as you needed?,” “In the last 6 months, how often did you get an appointment for a check-up or routine care as soon as you needed?,” and “In the last 6 months, how often did you see the person you came to see within 15 minutes of your appointment time?”], one-star performance <75%, five-star performance >81%) or not measured at all.