Category Archives: Uncategorized

Finding robust biomarkers for more accurate and reproducible disease diagnosis/prognosis

Identifying robust diagnostic/prognostic biomarkers from gene expression data that can lead to accurate and reproducible predictions is a challenging problem.

In our recent paper “Deep graph representations embed network information for robust disease marker identification”, we show that deep graph representations using graph convolutional networks (GCNs) can identify effective markers that significantly improve the predictive performance and their reproducibility across different datasets and platforms.

Omar Maddouri, Xiaoning Qian, Byung-Jun Yoon, “Deep graph representations embed network information for robust disease marker identification,” Bioinformatics, btab772,

In this paper, we proposed GCNCC (Graph Convolutional Network-based approach for Clustering and Classification) that can identify highly effective and robust network-based disease markers. Based on a geometric deep learning framework, GCNCC learns deep network representations by integrating gene expression data with protein interaction data to identify highly reproducible markers with consistently accurate prediction performance across independent datasets possibly from different platforms. GCNCC identifies these markers by clustering the nodes in the protein interaction network based on latent similarity measures learned by the deep architecture of a graph convolutional network, followed by a supervised feature selection procedure that extracts clusters that are highly predictive of the disease state.

By benchmarking GCNCC based on independent datasets from different diseases (psychiatric disorder and cancer) and different platforms (microarray and RNA-seq), we show that GCNCC outperforms other state-of-the-art methods in terms of accuracy and reproducibility.

Further details can be found at [Bioinformatics].

Our NeurIPS 2021 paper on efficient active learning for Gaussian process classification is now online

We are happy to announce that our NeurIPS 2021 paper entitled “Efficient Active Learning for Gaussian Process Classification by Error Reduction” is now available in

Guang Zhao, Edward Dougherty, Byung-Jun Yoon, Francis Alexander, Xiaoning Qian, “Efficient Active Learning for Gaussian Process Classification by Error Reduction,” Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS 2021), Dec. 6 – 14, 2021.

In this paper, we study active learning scenarios for Gaussian Process Classification (GPC), where we develop computationally efficient algorithms for EER (expected error reduction)-based active learning with GPC. In particular, we consider EER as the reduction of the Mean Objective Cost of Uncertainty (MOCU), where the learning objective of GPC is to reduce the classification error.

Our experiments clearly demonstrate the computational efficiency of the proposed approach and performance evaluation of our algorithms on both synthetic and real-world datasets show that they significantly outperform existing state-of-the-art algorithms in terms of sampling efficiency.

Post-doc position available in Scientific Machine Learning (SciML)

We are happy to announce the availability of a new post-doc position in areas relevant to Scientific Machine Learning (SciML).

The position resides in the Applied Mathematics Group of the Computational Science Initiative (CSI) at Brookhaven National Laboratory (BNL), and the post-doctoral researcher will work with Dr. Byung-Jun Yoon and Dr. Nathan Urban on a project focused on scientific data reduction.

We invite outstanding candidates to apply for a post-doctoral research associate position in applied mathematics, machine learning, and scientific computing. This position offers a unique opportunity to conduct research in emerging interdisciplinary research problems at the intersection of applied mathematics, machine learning, and high-performance computing (HPC) with applications in diverse scientific domains of interest to BNL and the Department of Energy (DOE).

Topics of specific interest include:

  • optimal decision-guided data reduction
  • feature extraction/engineering in high-dimensional compositional workflows that involve machine learning (ML) models
  • Bayesian inference and uncertainty quantification in scientific ML models
  • learning/optimization of low-dimensional latent feature spaces for ML surrogates.

The position includes access to world-class HPC resources, such as the BNL Institutional Cluster and DOE leadership computing facilities. Access to these platforms will allow computing at scale and will ensure that the successful candidate will have the necessary resources to solve challenging DOE problems of interest.

This program provides full support for a period of two years at CSI with possible extension. Candidates must have received a doctorate (Ph.D.) in applied mathematics, statistics, computer science, or a related field (e.g., mathematics, engineering, operations research, physics) within the past five years. This post-doc position presents a unique chance to conduct interdisciplinary collaborative research in BNL programs with a highly competitive salary.

For further information, please visit: [post-doc position announcement]

For inquiries, contact Dr. Byung-Jun Yoon or Dr. Nathan Urban.

KBTX news on scientific data reduction: is having more data always better for achieving scientific goals?

KBTX has featured Dr. Yoon and his team’s research project on scientific data reduction in their recent news.

“The team says people tend to think the more data someone has, the better it is for achieving their goal, which is not always the case. That’s why they’re working on a mechanism that can get rid of the unnecessary data without compromising what’s needed.”

Andy Krauss, Reporter – KBTX news

Further details can be found at the link below:

SLATE Future Tense features Dr. Yoon’s research on scientific data reduction

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society, exploring how emerging technologies will change the way we live.

Future Tense has recently posted an article entitled “Why the Department of Energy Is Spending Millions to Get Rid of Scientific Data”, in which they featured Dr. Yoon and his team’s research project on objective-driven reduction of scientific data, recently funded by the U.S. Department of Energy (DOE). The news article can be accessed at:

What is Future Tense?

“A partnership of Slate, New America, and Arizona State University, Future Tense explores how emerging technologies will change the way we live. The latest consumer gadgets are intriguing, but we focus on the longer-term transformative power of robotics, information and communication technologies, synthetic biology, augmented reality, space exploration, and other technologies. Future Tense seeks to understand the latest technological and scientific breakthroughs, and what they mean for our environment, how we relate to one another, and what it means to be human. Future Tense also examines whether technology and its development can be governed democratically and ethically. Future Tense asks these questions in daily commentary published on Slate and through public events featuring conversations with leading scientists, technologists, policymakers, and journalists.”

Quoted from Future Tense website. For further info, visit:

TRIMER – a modeling and simulation framework for integrated analysis of transcription regulation and metabolic regulation

Our recent collaborative work on integrated modeling of transcription regulation and metabolic regulation has been published in iScience, an open access interdisciplinary journal from Cell Press.

Puhua Niu, Maria J. Soto, Byung-Jun Yoon, Edward R. Dougherty, Francis J. Alexander, Ian Blaby, Xiaoning Qian, “TRIMER: Transcription Regulation Integrated with MEtabolic Regulation,” iScience, doi:10.1016/j.isci.2021.103218.

In this work, we have developed a modeling and simulation pipeline enabling the analysis of Transcription Regulation Integrated with MEtabolic Regulation: TRIMER. TRIMER utilizes a Bayesian network (BN) inferred from transcriptome data to model the transcription factor regulatory network (TRN). TRIMER then infers the probabilities of the gene states relevant to the metabolism of interest, and predicts the metabolic fluxes and their changes that result from the deletion of one or more transcription factors. In this study, we demonstrate that TRIMER yields accurate predictions of how the knockout of one or more TFs affect the metabolic outcomes, outperforming existing state-of-the-art approaches based on both simulated as well as real experimental data from gene knockout experiments.

Overview of the TRIMER modeling and simulation pipeline.

iScience is a new open-access journal from Cell Press that provides a platform for original research in the life, physical, and earth sciences. The primary criterion for publication in iScience is a significant contribution to a relevant field combined with robust results and underlying methodology. The advances appearing in iScience include both fundamental and applied investigations across this interdisciplinary range of topic areas.

Dr. Yoon’s DOE-funded project on scientific data reduction featured on TAMU ECE website

Dr. Yoon’s project on “Objective-Driven Data Reduction for Scientific Workflows” recently funded by the U.S. Department of Energy (DOE), Advanced Scientific Computing Research (ASCR), has been featured in a recent article posted on the Texas A&M Department of Electrical and Computer Engineering website.

The full article can be accessed at the link below:

DOE-funded research project to efficiently reduce massive scientific data

DOE funds research on “Objective-based Data Reduction for Scientific Workflows” to tame massive data sets to advance scientific discovery

The following news release was issued by the U.S. Department of Energy. It announces funding for nine projects that will address management and processing of massive data sets produced by scientific observatories, experimental facilities, and supercomputers that span the DOE national laboratory complex.

As part of this program, Brookhaven Lab was awarded $2.4 million in funding over three years. Dr. Byung-Jun Yoon of Brookhaven National Lab‘s Computational Science Initiative will lead the project with Texas A&M University and University of Illinois Urbana-Champaign as partners. Dr. Yoon and his team’s work will involve using novel theoretical strategies to develop practical algorithms anchored in scientific objectives – i.e., focused specifically on the goals of interest. These algorithms would bypass scientifically irrelevant information, which could considerably reduce overall data generated by complex experimental or computational systems and streamline any required processing.

Further information and the full announcement can be found at the link below:

New York Scientific Data Summit (NYSDS 2021) will be held virtually on Oct 26-29, 2021

The seventh annual New York Scientific Data Summit, NYSDS 2021, is a multi-day interactive virtual event that will focus on Applied Mathematics, HPC, and AI challenges in diverse areas, such as climate science, medical research, critical infrastructure and other cross-cutting research topics.

Register online now at There is no registration fee, but registration is required for attendance. Registration deadline is Thursday, October 21, 2021. The virtual event will be held on Tuesday, October 26 through Friday, October 29, 2021.

NYSDS will provide a great opportunity to connect with researchers, developers, and end-users to inform efforts that will drive data research forward with new ideas and partnerships.

Dr. Yoon to co-organize COVID-19 themed Data Hackathon event at the 2021 IEEE Healthcare Summit (IHS)

Dr. Yoon is co-organizing a Data Hackathon event at the 2021 IEEE Healthcare Summit (IHS). The Data Hackathon focuses on COVID-19 Datasets with the goal of demonstrating the potentials of Biomedical Health Informatics (BHI) and Artificial Intelligence (AI) to combat pandemics.

The Data Hackathon features a wide range of datasets and challenges, which include:

Individuals or teams interested in participating in the Hackathon should register by Sep. 15, 2021 and submit their report and code by Sep. 26, 2021. Many awards sponsored by IEEE and other sponsoring institutions will be given to the top performing participants.

For more information about the Data Hackathon, please visit:

To learn more about the IEEE Healthcare Summit, please visit: