Finding robust biomarkers for more accurate and reproducible disease diagnosis/prognosis

Identifying robust diagnostic/prognostic biomarkers from gene expression data that can lead to accurate and reproducible predictions is a challenging problem.

In our recent paper “Deep graph representations embed network information for robust disease marker identification”, we show that deep graph representations using graph convolutional networks (GCNs) can identify effective markers that significantly improve the predictive performance and their reproducibility across different datasets and platforms.

Omar Maddouri, Xiaoning Qian, Byung-Jun Yoon, “Deep graph representations embed network information for robust disease marker identification,” Bioinformatics, btab772, https://doi.org/10.1093/bioinformatics/btab772.

In this paper, we proposed GCNCC (Graph Convolutional Network-based approach for Clustering and Classification) that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state.

By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.

Further details can be found at [Bioinformatics].