Authors

  1. McGraw, Mark

Article Content

A decision-referral approach that relies on the strengths of radiologists, as well as artificial intelligence (AI), could improve the diagnostic capability of breast cancer screening, according to a new study (Lancet Digit Health 2022; https://doi.org/10.1016/S2589-7500(22)00070-X). A team of researchers proposed this type of approach for integrating AI into the breast cancer screening pathway, whereby the algorithm makes predictions on the basis of its quantification of uncertainty. Algorithmic assessments with high certainty "are done automatically," the authors wrote, "whereas assessments with lower certainty are referred to the radiologist."

  
AI Assistance. AI As... - Click to enlarge in new windowAI Assistance. AI Assistance

This two-part AI system can triage normal mammography exams and provide post-hoc cancer detection, maintaining a high degree of sensitivity, according to the authors. Their study aimed to evaluate the performance of this AI system on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original radiologist decision.

 

Study Details

A team of researchers used a retrospective dataset of 1,193,197 full-field, digital mammography studies carried out between January 1, 2007, and December 31, 2020, from eight screening sites participating in the German national breast cancer screening program. The authors derived an internal test dataset from six screening sites (1,670 screen-detected cancers and 19,997 normal mammography exams), and an external test dataset of breast cancer screening exams (2,793 screen-detected cancers and 80,058 normal exams) from two additional screening sites.

 

With these data, the researchers evaluated the performance of an AI algorithm on sensitivity and specificity when used either as a standalone system or within a decision-referral approach, compared with the original individual radiologist decision at the point-of-screen reading ahead of the consensus conference. The investigators evaluated different configurations of the AI algorithm, applying weights to reflect the actual distribution of study types in the screening program as a way to account for the enrichment of the datasets caused by oversampling cancer cases. Triaging performance was evaluated as the rate of exams correctly identified as normal. Sensitivity across clinically relevant subgroups, screening sites, and device manufacturers was compared between standalone AI, the radiologist, and decision referral.

 

The authors presented receiver operating characteristic (ROC) curves and area under the ROC to evaluate AI-system performance over its entire operating range. Comparison with radiologists and subgroup analysis was based on sensitivity and specificity at clinically relevant configurations. Overall, the researchers found that the exemplary configuration of the AI system in standalone mode achieved a sensitivity of 84.2 percent and a specificity of 89.5 percent on internal test data, and a sensitivity of 84.6 percent and a specificity of 91.3 percent on external test data, but was less accurate than the average unaided radiologist.

 

By contrast, the simulated decision-referral approach "significantly improved upon radiologist sensitivity," the authors wrote, noting a 2.6 percent increase in sensitivity and a 1 percent increase in specificity, corresponding to a triaging performance of 63 percent on the external dataset. The decision-referral approach also yielded significant increases in sensitivity for a number of clinically relevant subgroups, including subgroups of small lesion sizes and invasive carcinomas, the authors wrote, noting that sensitivity of the decision-referral approach was consistent across the eight included screening sites and three device manufacturers.

 

The decision-referral approach the authors relied on "leverages the strengths of both the radiologist and AI," demonstrating improvements in sensitivity and specificity surpassing that of the individual radiologist and of the standalone AI system, the researchers wrote, adding that this approach has the potential to improve the screening accuracy of radiologists, is adaptive to the requirements of screening, and could allow for the reduction of workload ahead of the consensus conference, without discarding the generalized knowledge of radiologists.

 

The investigators hypothesized that the collaboration of radiologists and AI "is better in terms of screening accuracy than either party alone, if combined in the right way," noted Christian Leibig, PhD, the study's lead author. "We propose such a way, coined 'decision referral,' for which confident algorithmic assessments are performed automatically, at least in the retrospective simulation, and cases about which the algorithm is not confident enough are referred to radiologists for their assessment."

 

The decision-referral approach may be able to improve the accuracy of breast cancer screening "because each mammogram is assessed by either AI or radiologists, depending on which party is expected to perform better at the given case at hand," Leibig added. "AI accomplishes this goal by first assessing all mammograms in the background and assigning each case to either a radiologist or the AI itself, based on the model's confidence."

 

Ultimately, the study's findings indicate that there is potential to maintain or improve both sensitivity and specificity while allowing for a "substantial workload reduction," Leibig noted. "Our retrospective study assumed that all algorithmic assessments were not reviewed by radiologists, i.e., they were modeled as fully automated. If it were implemented like that, this would mean that radiologists have more time to focus on the cases [at which] they are better."

 

Mark McGraw is a contributing writer.