Keywords

Breast cancer, Health screening, Logistic regression, Naive Bayes, Prediction

 

Authors

  1. Lee, Sun-Mi PhD, RN, MPH
  2. Park, Jin-Hee MSN, RN
  3. Park, Han-Jong MSN, RN

Abstract

Highly accurate and predictive models are essential components to promote early breast cancer screening in primary care or home care settings. This study was conducted to demonstrate how the relevant variable selection process influenced the predictive performance of the model to identify individuals at high risk for breast cancer. As such, as a strategy to increase the predictive performance of the models, a systematic review of previously published articles was conducted to select important risk factors for breast cancer. Through the systematic literature review and the application of variable selection methods, 13 final risk factors were identified. Logistic regression and naive Bayes predictive modeling techniques were used. Both models had higher predictive performances than previously developed models. It is believed that the systematic literature review process contributed to the identification of relevant variables and increased the predictive performance of the models. This study also implies that the naive Bayes was equivalent to and could be preferred over logistic regression.