HSS Study Highlights Flaws in CMS Star Ratings
A leading quality rating for the nation’s hospitals appears not to adequately account for the risks of undergoing certain procedures at certain hospitals, particularly joint replacement surgery, according to a new st udy by researchers at Hospital for Special Surgery (HSS) in New York City.
The study, reported in the JBJS Open Access, found that the Overall Hospital Quality Star Ratings program from the Centers for Medicare & Medicaid Services (CMS) is unreliable in several respects. In particular, the star system significantly understates the risk of complications among patients who undergo total joint arthroplasty (TJA) at hospitals that perform relatively few of these surgeries. The researchers also showed that the star algorithm fails to fully capture the typical observation that higher surgical volume is associated with better quality outcomes.
"The apples to oranges comparisons across hospitals in the Hospital Stars program produce ratings that are of uncertain usefulness," said Catherine H. MacLean, MD, PhD, a rheumatologist and chief value medical officer at HSS, a co-author of the new study. "The Hospital Stars Program misses the mark in providing actionable information to consumers, who generally are looking to understand the quality for specific procedures."
The study is particularly timely, as CMS announced recently that it was considering an overhaul of the way it calculates the controversial star rating system. Any changes would not take effect until 2020, according to the agency, which will be accepting public comments on the proposed revision.
CMS launched the star ratings in July 2016 as part of a broader effort to promote value-based care -- higher quality outcomes at the lowest possible cost. The system, which is due for an update this month, currently includes 57 performance measures covering seven categories. Together they capture mortality, patient safety, readmission to the hospital, effectiveness and timeliness of care and other relevant factors in hospitalization. Hospitals can receive an overall rating of between one and 5 stars.
"However, in creating the system, CMS did not fully account for the impact of the volume of procedures in its algorithm," says Mark Alan Fontana, PhD, a data scientist and health economist at HSS and lead author of the new study. Hospitals that perform many of a particular surgery or intervention typically have better outcomes for those procedures than facilities that see fewer such patients. But the star system does not include measures for hospitals that perform fewer than 25 (but more than zero) procedures over a three-year period in some cases. As a result, he and his colleagues speculated that the ratings would change, perhaps significantly, if those data were incorporated into the model.
"We hypothesized that if some quality measures for certain hospitals are excluded from CMS’s calculations because of low volume, and if low volume is associated with worse outcomes, then including those omitted quality measures would negatively impact associated hospitals’ ratings," they wrote. Moreover, because the star system links all hospitals through relative ratings, the changes could affect other facilities in the database, too.
The HSS researchers assessed four measures, two for TJA—complications and readmissions—and two for cardiac surgery—mortality and readmissions, for which high-volume hospitals tend to perform better than low-volume hospitals. They used three methods to estimate values for the missing low-volume facilities from the public CMS database.
For three of the four measures, including the estimates had no effect on the overall ratings—suggesting that the star ratings do not reflect the volume-outcome relationship for these measures.
For the fourth, complications after TJA, nearly 40% of hospitals saw their score change once the estimates of the low-volume data were added to the model. Of those, roughly a third gained a star or more while the remaining two-thirds lost a star or more. Although the exact percentages differed depending on the method the researchers used to estimate the missing values, the overall trend was the same for each of the three approaches, they said. "Overall, although low-volume hospitals were more often hurt than helped after imputation [of the missing, low-volume outcome measures] … higher-volume hospitals were also more often hurt than helped," they report.
The researchers also showed that the underlying safety domain model is not stable. This is because it heavily weighs one quality measure. Slight changes to the underlying data, like complications after TJA, force the model to "flip" to heavily weigh another quality measure. This can dramatically change a hospital’s star rating.
The findings underscore the importance of making quality metrics as accurate as possible. "As health-care incentives move toward value-based programs in which higher quality is rewarded financially, defining quality in relative terms may very well prevent some providers with high absolute quality scores, but lower relative quality scores, from achieving those rewards," the HSS researchers write. "Relative ratings will tend to overemphasize the differences between hospitals, even if true differences are minimal. This overemphasizing of differences, in turn, could engender counterproductive competition by discouraging hospitals from sharing best practices and collaborating."