Abstract
Monitoring population obesity risk primarily depends on self-reported anthropometric data prone to recall error and bias. This study developed machine learning (ML) models to correct self-reported height and weight and estimate obesity prevalence in US adults. Individual-level data from 50 274 adults were retrieved from the National Health and Nutrition Examination Survey (NHANES) 1999-2020 waves. Large, statistically significant differences between self-reported and objectively measured anthropometric data were present. Using their self-reported counterparts, we applied 9 ML models to predict objectively measured height, weight, and body mass index. Model performances were assessed using root-mean-square error. Adopting the best performing models reduced the discrepancy between self-reported and objectively measured sample average height by 22.08%, weight by 2.02%, body mass index by 11.14%, and obesity prevalence by 99.52%. The difference between predicted (36.05%) and objectively measured obesity prevalence (36.03%) was statistically nonsignificant. The models may be used to reliably estimate obesity prevalence in US adults using data from population health surveys.