Keywords

health surveillance, household income, missing data, multiple imputation

 

Authors

  1. Zeng, Zhiwei MD, MPH

Abstract

Background: Although advanced multiple imputation (MI) methodology has become widely introduced and increasingly used, few have reported for health surveillance, where missing incoming data is a common and serious problem. This study examined the application of MI for incomplete income data in population-based health surveillance.

 

Methods: In the 2002-2003 Los Angeles County Health Survey (N = 8 167), self-reported household income converted into Federal Poverty Levels (FPLs) was imputed using MI for 1 381 (16.9%) missing cases. Validity was assessed with the 6 786 completed cases where 1 381 FPLs were randomly masked and MI was applied. Consistency was examined by Z tests comparing imputed and original FPL statistics. Multiple imputation statistical inference was examined by estimating 95 percent confidence intervals of Pearson correlation coefficients with 5 percent, 10 percent, 15 percent, and 20 percent of masked and imputed FPL and with different sets of covariates and comparing them with original correlation coefficients.

 

Results: Among 188 major surveillance statistics, Z tests showed that imputed and original FPL were consistent by 96.3 percent as demographics but only 19.4 percent as outcome variables. With well-established covariates, powerful MI statistical inference was indicated when missing proportion was within 15 percent, but it started fading out as the missing proportion increased to 15 percent and over.

 

Conclusions: Multiple imputation provides a feasible approach and produces differing results for incomplete income data in population-based health surveillance. It performs better for demographic variables than for outcome variables and is more powerful with lower missing proportions than higher ones. With well-established covariates, MI statistical inference could be reliable for missing proportions up to 15 percent.