A team of investigators has developed and validated a deep learning algorithm they say can identify and segment a non-small cell lung cancer (NSCLC) tumor on a CT scan within seconds, which could ultimately help streamline radiotherapy treatment for patients. Recently published in The Lancet Digital Health, the researchers' study noted the "great potential" that artificial intelligence (AI) and deep learning have shown in their ability to streamline clinical tasks (2022; https://doi.org/10.1016/S2589-7500(22)00129-7).
The authors pointed out, however, that "most studies remain confined to in silico validation in small internal cohorts, without external validation or data on real-world clinical utility." The team led by researchers from Brigham and Women's Hospital and Mass General Brigham developed a strategy for the clinical validation of deep learning models for segmenting primary NSCLC tumors and involved lymph nodes in CT images, "which is a time-intensive step in radiation treatment planning, with large variability among experts," the authors wrote.
Study Details
In their observational study, the investigators collected CT images and segmentations from eight internal and external sources from the U.S., the Netherlands, Canada, and China using patients from the Maastro and Harvard-RT1 datasets for model discovery. Validation consisted of interobserver and intraobserver benchmarking, primary validation, functional validation, and end-user testing on a number of datasets.
Primary validation entailed stepwise testing on increasingly external datasets using measures of overlap, including volumetric dice and surface dice. Functional validation explored dosimetric effect, model failure modes, test-retest stability, and accuracy, according to the authors, who noted that end-user testing with eight experts assessed automated segmentations in a simulated clinical setting.
The study included more than 2,200 patients imaged between 2001 and 2015, with 787 patients ultimately used for model discovery and 1,421 used for model validation, including 28 patients for end-user testing. The authors demonstrated that the AI algorithm performance in lung tumor targeting compared against human expert fell within the range of variation in performance observed between trained clinicians. Overall, the models yielded target volumes with equivalent radiation dose coverage to those of experts, with the investigators also finding "non-significant differences" between de novo expert and AI-assisted segmentations. AI assistance led to a 65 percent reduction in segmentation time and a 32 percent reduction in interobserver variability.
"We know that radiation therapy planning is a highly manual, time-consuming, and resource-intensive process that requires highly trained physicians to segment (target) the cancerous tumors in the lungs and adjacent lymph nodes on three-dimensional images such as CT scans," stated study co-author Raymond Mak, MD, Director of Patient Safety and Quality and Director of Clinical Innovation at Brigham and Women's Hospital. "Prior studies have shown substantial inter-clinician variation in these radiotherapy targeting tasks, and there is a projected global shortage of skilled medical staff to perform these critical tasks as cancer rates increase."
Creating Algorithms
In an effort to address this "critical gap," Mak and his colleagues hypothesized that they could train and develop AI algorithms that would automatically target lung cancer in the lungs and adjacent lymph nodes from CT scans used for radiation therapy planning, "which can be deployed in seconds," noted Mak, who is also Associate Professor of Radiation Oncology at Harvard Medical School.
The team also hypothesized that it could produce a high-performing algorithm by using high-quality training datasets from an expert clinician and incorporate early expert clinician input into the training process in order to "identify gaps in AI performance for remediation," he said.
One of the biggest translation gaps in AI applications in oncology is in studying how to use AI to improve human clinicians and vice versa, Mak noted. "In this study, we took advantage of clinician expertise in the development and training of the AI algorithm, and then we showed that deploying the AI to support clinicians in a human-AI partnership can also improve clinician performance with reduced task time and less variation."
Looking ahead, Mak and his co-authors believe "there will be a direct benefit to cancer patients through thoughtful testing and implementation of human-AI collaboration in radiation therapy planning by providing patients with higher-quality tumor segmentation and accelerating times to treatment."
In addition, surveys the researchers conducted of clinicians who partnered with the AI demonstrate that they also experienced substantial benefits in reduced task time, high satisfaction, and reduced perception of task difficulty, "which is an interesting additional benefit that we had not thought about initially," Mak said. "Wouldn't it be interesting if this trend bears out in wider studies, with AI leading to reductions in clinician cognitive load and stress, and possibly help with physician burnout?"
For those clinicians who are evaluating new AI technologies for clinical implementation, Mak and his colleagues are hopeful their study provides a framework for thoughtful AI development that incorporates clinician input and includes a rigorous testing and validation framework, including performance benchmarking, identifying key failure modes, and determining whether an AI algorithm performs as intended in the hands of clinicians before introduction of the algorithm in the clinic.
"We believe that an evaluation strategy for AI models that emphasizes the importance of human-AI collaboration is especially necessary because in silico (computer-modeled) validation can give different results than clinical evaluations," Mak noted. "As an extension of this work, we are designing and conducting prospective, randomized trials of similar AI auto-segmentation algorithms in the clinic to provide the highest level of evidence."
Mark McGraw is a contributing writer.