Abstract
Hospital readmission due to heart failure is a topic of concern for patients and hospitals alike: it is both the most frequent and expensive diagnosis for hospitalization. Therefore, accurate prediction of readmission risk while patients are still in the hospital helps to guide appropriate postdischarge interventions. As our understanding of the disease and the volume of electronic health record data both increase, the number of predictors and model-building time for predicting risk grow rapidly. This suggests a need to use methods for reducing the number of predictors without losing predictive performance. We explored and described three such methods and demonstrated their use by applying them to a real-world dataset consisting of 57 variables from health data of 1210 patients from one hospital system. We compared all models generated from predictor reduction methods against the full, 57-predictor model for predicting risk of 30-day readmissions for patients with heart failure. Our predictive performance, measured by the C-statistic, ranged from 0.630 to 0.840, while model-building time ranged from 10 minutes to 10 hours. Our final model achieved a C-statistic (0.832) comparable to the full model (0.840) in the validation cohort while using only 16 predictors and providing a 66-fold improvement in model-building time.