Cenophobia, the fear of new things and ideas, is common among clinicians and practitioners of cardiopulmonary rehabilitation, particularly if it involves statistical analyses (aka sadistics!!). For many clinicians, statistics are at best mysterious and confusing, but to some they are downright frightening!! However, with a little time and effort, most anyone can become reasonably savy about statistics and their implications/limitations. There are many good books (eg, Statistics for Dummies1), and some Web sites2,3 (http://www.statsoft.com/textbook/stathome.html and http://www.sportsci.org/resource/stats/index.html), that can assist practitioners in their quest to better understand this important aspect of clinical research. Overcoming your "statistical cenophobia" will likely make you a better clinician, as you will be better able to grasp the significance of recent research findings and better thus be able to translate these findings to your patients and program.
Hopefully you have read the preceding manuscript by Mandric and colleagues4 at Stanford University/VA Palo Alto Health Care System. My sense is that some of the readers of JCRP will have read the title and quickly turned to the next article. Unfortunately, if this was your reaction, you missed some important findings from a research group, led by Drs Jon Myers and Victor Froelicher, that has contributed more research in the area of cardiopulmonary exercise testing than any other group that comes to mind.
First, I would like to congratulate the authors for conducting and publishing yet another interesting and important study, and second, I commend the editors of JCRP for accepting a manuscript that is likely a statistical "stretch" for a portion of the readership.
With this said, what is important and relevant about this aforementioned manuscript? We recognize that the sensitivity and specificity of any diagnostic test, in this case, the exercise electrocardiogram (ECG), depend not only on more than just the quality (normal/negative vs abnormal/positive) but also on the definition or cutpoint used to make that determination. In ECG stress testing, we usually choose a cutpoint for significance of ST segment depression, for example, 1 mm or more or 2 mm or more, which describes an abnormal test result. The position of the cutpoint determines the number of true positives, true negatives, false positives, and false negatives. Furthermore, we may wish to use different cutpoints for different clinical situations to minimize erroneous findings. In the case of ECG stress testing, decreasing the threshold/cutpoint of ST segment depression from 2 mm or more to 1 mm or more will increase the sensitivity (due to fewer false negatives) but decrease the specificity (due to more false positives). In contrast, increasing the cutpoint of ST segment depression to 3 mm or more will decrease the sensitivity but improve the specificity (more false-negatives but less false-positives). Commonly, the predictive accuracy (PA) and positive predictive value (PPV) are used to evaluate the discriminatory ability of a test. As presented in Figure 1 of the preceding manuscript, calculations of both PA and PPV are dependent on the number of true positive and negative responses as well as the number of false-positive responses. The study by Mandic and colleagues clearly demonstrates that the PPV associated with ECG stress testing is unstable and will increase proportionally to the prevalence of CAD in the population being tested.
In contrast, this study also determined that the PA as well as the receiver operating curve (ROC) characteristics, specifically the area under the curve (AUC), was stable and therefore superior, despite a manipulation of CAD prevalence. An ROC curve is a graphical representation of the trade-off between the false-negative and false-positive rates for every possible cutoff. In the case of ECG stress testing as described above, as the ST segment depression cutpoint for abnormal (positive) versus normal (negative) is changed, the false-negative and false-positive rates will change. Equivalently, the ROC curve is the representation of the trade-off between sensitivity and specificity.
Is there a "good" ROC curve or a "bad" ROC curve? All ROC curves are good; it is the diagnostic test that can be good or bad. A good diagnostic test is one that has both few false-positive responses and few false-negative responses across a reasonable range of cutpoint values. A bad diagnostic test is one in which the range of cutpoints available generates a large number of false-positive rates and false-negative results. Generally, an ROC curve that climbs rapidly upward toward the left-hand corner of the graph (see Figure 2) indicates a "good" diagnostic test. A curve with this shape means that the false-negative rate is high and the false-positive rate is low. Furthermore, how quickly the ROC curve rises to the upper left-hand corner (see Figure 2) can be quantified by measuring the AUC. The larger the AUC, the better the diagnostic test. If the area is 1.0, the test is "ideal" because it achieves 100% sensitivity and 100% specificity. If the AUC is 0.5, then you have a test that has effectively a 50% sensitivity and a 50% specificity, which is no better than flipping a coin. In practice, a diagnostic test is going to have an AUC somewhere between these 2 extremes. Generally, ROC AUC classification of diagnostic tests is 0.50 to 0.75 (fair), 0.75 to 0.92 (good), 0.92 to 0.97 (very good), and 0.97 to 1.00 (excellent).
Hopefully, this short commentary on the predictive modeling and ROC approaches used by Mandic et al will enhance your understanding of the statistical approaches evaluated as well as the relevance of this manuscript. Despite the authors' own conclusion that "clinicians will continue to be more comfortable with predictive modeling while biostatisticians and health policy planners will be the sole applicants of ROC analysis," my hope is that the readers of JCRP will overcome their "statistical cenophobia" and prove them wrong!! If you have not done so already, turn back and give the preceding manuscript a closer look to gain a better understanding of a statistical technique that is being used with increasing frequency in biomedical research.
References