Findings from the ImmunogenomiC prOfiling of Non-small cell lung cancer (NSCLC) Project (ICON)-an effort to comprehensively characterize immuno-genomic diversity in NSCLC across diverse platforms-were recently presented at the AACR-NCI-EORTC Virtual International Conference on Molecular Targets and Cancer Therapeutics (Abstract P009).
Using a framework they developed, the researchers were able to identify "modules in the ICON data network significantly associated with important patient characteristics like recurrence and oncogenotype," according to study author Stephanie T. Schmidt, PhD, and her colleagues from The University of Texas MD Anderson Cancer Center, Houston.
"ICON seeks to comprehensively characterize immunogenomic diversity in non-small cell lung cancer," noted Schmidt. "Its depth and breadth presented a unique opportunity to develop a specialized method for multi-platform data integration and exploration, which can broadly be applied to large-scale patient profiling studies.
"ICON leverages a broad array of immune and genomic platforms to profile tumor and adjacent uninvolved lung tissue samples," she continued. "We sought to explore connections between measurements across the different modalities and, to do so, developed an approach that used correlations between measurements from different modalities to integrate the data and build the ICON data network."
Study Details
The ICON dataset is derived from tumor and normal lung tissue samples collected from 150 patients at time of resection, as well as blood samples taken then and at intervals throughout the following year, according to the study authors.
"Tissue samples underwent RNA sequencing (RNA-seq), whole exome sequencing, T-cell receptor sequencing, multiplex immunofluorescence for immune cells, and reverse phase protein array (RPPA) profiling," they outlined. "Flow cytometry for immune cells was performed on tissue and blood samples."
To integrate the multi-platform ICON data and build the ICON data network, they drew upon the shared nearest neighbors (SNN) algorithm to link genes on the basis of their shared top correlates and orthogonal datasets, Schmidt explained. "For each gene, measurements from a given orthogonal modality were ranked by correlation and, for each pair of genes, the number of shared top correlates was used to calculate the overlap score of that gene pair.
"This process was performed across all gene pairs to compile a shared nearest neighbors score matrix for a given orthogonal modality. And from these matrices, gene pairs whose overlap scores met a desired threshold were collected and used to build a network in which each gene is a node and each pair meeting the threshold forms an edge," she elaborated. "Thus, our shared nearest neighbors-based approach enables the integration of orthogonal modalities, and we implemented it to bring together data from [flow cytometry], RPPA, and RNA-seq.
"Through benchmarking with validated gene pairs, we demonstrate that the ICON data network outperforms a comparable network based on RNA-RNA correlations only, highlighting the strength of the approach we developed," Schmidt noted during her presentation. "We found that the incorporation of gene pairs based on RNA-RNA correlations into our SNN network further improved performance."
The researchers used this blended approach of edges from SNN and RNA-RNA correlations to create the finalized icon data network. Currently, the network includes more than 20,000 genes linked by over 500,000 connections derived from correlations between RNA-seq and orthogonal platforms. According to Schmidt, this offers "an unprecedented holistic view into ICON."
The researchers observed that nodes are fairly balanced across source modalities, while edges tend to come more from the SNN-based approach, which, Schmidt noted, highlights the richness of the information captured by the approach her team developed.
Schmidt and her colleagues "captured established associations between cancer-related genes and examined these along with new ones in the network." To accomplish this, they used the InfoMap algorithm to extract more interpretable sub-networks, termed modules, from the ICON data network.
"To enable the exploration of the ICON data network in tandem with additional orthogonal modalities available from ICON and to enable testing against patient characteristics of interest, single sample gene set enrichment scores for each module were tabulated for use in multivariate analysis," Schmidt said.
The researchers tested module signatures within neoadjuvant-free patients, controlling for stage and histology. This led to the identification of several modules linked to disease recurrence, according to Schmidt.
"Its signature score tends to be lower in patients who ultimately relapse, suggesting its potential impact as a prognostic biomarker," she explained. "Pathway annotation strongly associates this module with several interesting phenomena, providing mechanistic insights and highlighting areas for future exploration.
"Taken together, the SNN-based network approach we developed is powerful in that it enables integration of multi-platform datasets," Schmidt summarized. "Its application to ICON allowed us to create a rich and deep integrated network that we are mining around important clinical characteristics."
This research has identified modules associated with recurrence, which Schmidt and her team are exploring as a prognostic biomarker. "Ultimately, our novel network building approach enables the holistic exploration of patient data from diverse platforms, providing a more complete view of disease and unlocking insights for therapeutic targets, biomarkers, and treatment plans," she concluded.
Catlin Nalley is a contributing writer.