IN THIS ISSUE of the Journal, Fuller and colleagues summarize and compare "the properties of regression and categorical risk adjustment models" (Fuller et al., 2016). Their analysis is informed by decades of experience building and refining tools for case-mix adjustment, including Medicare Severity Diagnosis Related Groups (MS-DRGs) and All Patient Refined Diagnosis Related Groups (APR-DRGs). However, I will argue that they have created a false dichotomy, using what they describe as "regression-based models" as a "straw man" for various critiques that are really linked to specific applications of regression methods rather than to predictive modeling as a general approach.
Regression models are simply an empirical tool to identify and estimate relationships between an outcome (dependent) variable and a set of predictor (independent) variables. To be specific, a regression model describes the expected value of the outcome variable conditional on the observed values of the predictor variables (given certain assumptions about the functional form of the relationships among variables). The developer of the model must specify the predictor variables to be considered, the shape of the relationship between each predictor and the outcome, and the hypothesized interactions among predictor variables. Often, the developer must choose whether to use finer or coarser categories, and whether to allow fewer or more interactions, so regression models can be useful for informing clinicians about which choices are important.
In other words, both "clinical categorical models" and "regression-based models," as described by Fuller and colleagues, involve 2 fundamental steps: (1) specifying a set of predictor variables and a set of hypothesized relationships or interactions among those variables; and (2) applying this conceptual model to actual data to empirically estimate the strength of these relationships and thereby to estimate expected values of the outcome variable. Models such as MS-DRGs and APR-DRGs focus on step 1; the developers iteratively refine their construction of risk profiles by integrating clinical logic and empirical testing. Step 2 is deferred to other analysts, often using other data sets, in a process that Fuller and colleagues describe as calculating "weights or rates." Models such as Hierarchical Condition Categories simply integrate steps 1 and 2; the developers specify a conceptual model in much the same way but then apply it to actual data to estimate specific parameters ("weights"), which can then be used to calculate expected values.
Many of Fuller and colleagues' criticisms of "regression-based models" are not about regression modeling per se but about how developers of regression models approach aforementioned step 1. For example, developers of regression models often make simplifying assumptions that are clinically illogical, such as assuming that risk factors do not interact with each other. Their diagram 1 does not illustrate any fundamental difference between categorical models and regression models. As the authors describe, the categorical model could be fixed simply by adding a third category whereas the regression model could be fixed using the same 3 categories as covariates. These 2 solutions are, in fact, identical. Both solutions are driven by the same clinical theory (ie, factors A and B have synergistic effects on cost) and confirmed by the same empirical observation.
In fact, Fuller and colleagues describe implementation problems, not fundamental differences between "categorical models" and "regression models." In practice, the developers of categorical models tend to be so focused on parsimony in constructing categories that they aggregate dissimilar cases (eg, assigning all patients with cesarean delivery to just 2 risk levels). But this is not an inherent problem with categorical models, as "the Diagnosis Treatment Combinations (DTCs) used for payment in Holland have 29 000 distinct categories" (which are more than most regression models offer). Conversely, the developers of risk adjustment models tend to be so focused on simplicity and generalizability that they ignore interactions. But this is not an inherent problem with regression models; Classification and Regression Trees (CART) and Least Absolute Shrinkage and Selection Operator (LASSO) are just 2 of several methods to enhance prediction accuracy and interpretability of models that include complex interactions. Case-mix regression models often include interactions that are viewed as particularly salient; for example, most of the Agency for Healthcare Research and Quality's Quality Indicator models include interactions between sex and age categories to account for the fact that sex differences in clinical outcomes tend to increase with age.
Understanding that the Fuller and colleagues' dichotomy between "categorical models" and "regression models" is essentially false, several of their criticisms of the latter become red herrings:
1. "Clinical categorical models" only create "a language understood by physicians" if the categories are described clearly using standard clinical terminologies. Do the 29 000 DTC categories in Holland really meet this standard? I would argue that even the 749 MS-DRG categories in the United States do not meet this standard, as I have experienced great difficulty explaining to other clinicians why comorbidities and complications (CCs) are lumped, how CCs differ from major CCs, and why CCs are sometimes lumped with major CCs and sometimes with no CCs. This language is simply not understood by most physicians.
2. Regression-based models only have "minimal communication value" if the sponsors choose not to communicate with stakeholders. In fact, clinicians are generally accustomed to interpreting hazard ratios, rate ratios, and odds ratios, which are readily derived from survival models, Poisson models, and logistic models, respectively. To cite one example, members of the University HealthSystem Consortium (now Vizient) strongly encouraged it to move away from using an APR-DRG-based approach to case-mix adjustment toward logistic/linear regression models that physicians find more understandable and actionable.
3. The vaunted "stability" of "clinical categorical models" in response to changing practice patterns or technology is illusory. In fact, the Centers for Medicare & Medicaid Services (CMS) modifies its MS-DRGs incrementally every year, as new procedures are introduced, as codes for newly recognized conditions are implemented, and as changing clinical and coding practices weaken previous relationships (ie, "upcoding" sepsis and chronic kidney disease). Contrary to Fuller and colleagues' assertion that "different payers (eg, Medicare and Medicaid) can use different prices in payment applications of DRGs while calculating payment using the same DRGs," the CMS has urged states not to use MS-DRGs in Medicaid programs because they were designed and validated on elderly Medicare beneficiaries receiving fee-for-service care and their generalizability to the Medicaid population is suspect (Centers for Medicare & Medicaid Services, 2004). To the extent that "clinical categorical models" may be more stable than "regression-based models," it is only because the former include fewer variables that have stronger conceptual rationale and are more carefully selected. The annual reestimation of payment weights for MS-DRGs is exactly parallel to the annual reestimation of regression weights for the Agency for Healthcare Research and Quality's Quality Indicators and similar measures; these procedures represent 2 sides of the same coin.
4. The cited peculiarities of CMS' Hospital Readmission Reduction Program models can be interpreted in 2 ways. To the extent that different effects of the same risk factor in different patient cohorts are robust over time, they probably represent "unsuspected truths." For example, a history of coronary artery bypass graft surgery may indicate worse coronary artery disease but better outpatient access to cardiac specialty care, leading to increased readmission risk after acute myocardial infarction but reduced readmission risk after discharge for heart failure. To the extent that different effects of the same risk factor in different patient cohorts are not robust over time, the models are simply overfitted. Overfitting is a well-recognized problem that can be identified and avoided.
5. Finally, in 2016, regression-based analyses using previously validated models are no longer "difficult to perform independent of developers." In fact, most American physicians now carry mobile devices that can readily implement such models. On a daily basis seeing patients in primary care, I estimate multivariable Framingham-based (http://tools.acc.org/ASCVD-Risk-Estimator/) risk scores to make informed decisions about lipid-lowering therapy (Stone et al., 2014) and CHA(2)DS(2)-VASc risk scores to assess the benefits of anticoagulation among patients with atrial fibrillation (Lip et al., 2010). These risk scores are based on regression models; the weights of all component variables are transparent, and researchers can reproduce them and assess their robustness in the current environment.
In summary, Fuller and colleagues offer a false dichotomy between "clinical categorical models" and "regression-based models" that cannot move the field forward. However, they do identify key limitations related to how both approaches have been implemented over the past 30 years:
1. "Regression-based models" are often not as robust as they should be because developers tend to "throw in" all of the available variables, even when those variables are not recognized as risk factors by clinicians and clinical researchers.
2. "Regression-based models" often fail to account for clinically important interactions among risk factors.
3. "Clinical categorical models" often fail to account for important risk factors (eg, "present on admission" status).
4. "Clinical categorical models" often lump cases that should be split; in other words, they emphasize parsimony over bias reduction. In fact, risk (health outcomes) and severity (resource use) are continuously distributed, albeit with certain constraints (ie, the risk of death never exceeds 1, resource use is never less than zero), so forcing everyone into a few buckets discards valuable information.
Over the next 30 years, I hope to see a convergence of approaches that will address all of these limitations. Just as CMS adjusts hospital payments based on a continuous measure of regional labor input costs, and premiums for Medicare Advantage plans based on a continuous measure of predicted utilization, so too should case-mix severity be adjusted using many more categories than either MS-DRGs or APR-DRGs can possibly support. Case mix is properly viewed as a modifier of risk or severity within clinical "product lines," each of which can be defined using a "clinical categorical model" or grouper. In other words, the skeletal structure of MS-DRGs and APR-DRGs is helpful to administrators and payers who need to manage their work, but 2-level or 3-level stratification (e.g., CC or MCC) has outlived its usefulness in the modern era. Integrating a careful, clinically-driven process for selecting variables, based on a clear conceptual framework, with regression-based analytic methods will allow us to improve predictions, reduce bias, and improve understandability and actionability at the same time.
REFERENCES