Keywords

home healthcare, natural language processing, nursing informatics, risk prediction, text mining

 

Authors

  1. Topaz, Maxim
  2. Woo, Kyungmi
  3. Ryvicker, Miriam
  4. Zolnoori, Maryam
  5. Cato, Kenrick

Abstract

Background: About 30% of home healthcare patients are hospitalized or visit an emergency department (ED) during a home healthcare (HHC) episode. Novel data science methods are increasingly used to improve identification of patients at risk for negative outcomes.

 

Objectives: The aim of the study was to identify patients at heightened risk hospitalization or ED visits using HHC narrative data (clinical notes).

 

Methods: This study used a large database of HHC visit notes (n = 727,676) documented for 112,237 HHC episodes (89,459 unique patients) by clinicians of the largest nonprofit HHC agency in the United States. Text mining and machine learning algorithms (Naive Bayes, decision tree, random forest) were implemented to predict patient hospitalization or ED visits using the content of clinical notes. Risk factors associated with hospitalization or ED visits were identified using a feature selection technique (gain ratio attribute evaluation).

 

Results: Best performing text mining method (random forest) achieved good predictive performance. Seven risk factors categories were identified, with clinical factors, coordination/communication, and service use being the most frequent categories.

 

Discussion: This study was the first to explore the potential contribution of HHC clinical notes to identifying patients at risk for hospitalization or an ED visit. Our results suggest that HHC visit notes are highly informative and can contribute significantly to identification of patients at risk. Further studies are needed to explore ways to improve risk prediction by adding more data elements from additional data sources.