Category Archives: Uncategorized

Optimizing high-throughput virtual screening campaigns

The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property prediction models make screening practically challenging.

In our recent work published in Patterns, a premium open-access journal from Cell Press, we propose a general framework for constructing and optimizing a high-throughput virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return on computational investment. Based on both simulated and real-world data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate virtual screening without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency.

Optimal decision-making in high-throughput virtual screening pipelines

For further details, please refer to the paper at the link below:
https://www.cell.com/patterns/fulltext/S2666-3899(23)00267-2

Uncertainty-aware framework for quantifying the reproducibility of scientific workflows that involve AI/ML models

The capability to replicate the predictions by machine learning (ML) or artificial intelligence (AI) models and the results in scientific workflows that incorporate such ML/AI predictions is driven by a variety of factors. An uncertainty-aware metric that can quantitatively assess the reproducibility of the quantities of interest (QoI) would contribute to the trustworthiness of the results obtained from scientific workflows involving ML/AI models.

In a recent perspective article published in Digital Discovery – from the Royal Society of Chemistry (RSC) – we discuss how uncertainty quantification (UQ) in a Bayesian paradigm can provide a general and rigorous framework for quantifying the reproducibility of complex scientific workflows. Such frameworks have the potential to fill a critical gap that currently exists in ML/AI for scientific workflows, as they will enable researchers to determine the impact of ML/AI model prediction variability on the predictive outcomes of ML/AI-powered workflows. We expect that the envisioned framework will contribute to the design of more reproducible and trustworthy workflows for diverse scientific applications, and ultimately, accelerate scientific discoveries in the future.

For more information, access the full article at the link below:

https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00094J

Epidemiological Modeling in the Exascale Era and Decision-Making Under Uncertainty

Dr. Yoon will be serving as a PI of a new DOE-funded $12M project entitled “EMERGE: ExaEpi for Elucidating Multiscale Ecosystem Complexities for Robust, Generalized Epidemiology” (Lead-PI: Dr. Peter Nugent, Lawrence Berkeley National Laboratory). This project is part of DOE’s Biopreparedness Research Virtual Environment (BRaVE) initiative, which supports national biopreparedness and response capabilities that can be advanced with DOE’s distinctive capabilities.

Epidemiological models are indispensable tools for predicting, understanding, and mitigating the impact of infectious diseases. In the early days of the COVID-19 pandemic, Berkeley Lab researchers led a multi-institutional effort to develop an agent-based model that could effectively harness the power of cutting-edge exascale supercomputers to speed predictions of disease spread for the Centers for Disease Control and Prevention and other public health agencies.

The EMERGE (ExaEpi for Elucidating Multiscale Ecosystem Complexities for Robust, Generalized Epidemiology) team will build on their successes and expand the capabilities of ExaEpi, an exascale-ready epidemiological agent-based model to target six diseases: COVID-19, influenza, cholera, Zika, Nipah virus, and Burkholderia pseudomallei. Ultimately, the goal is to make ExaEpi a generalized tool for epidemiology and ensure that it will be flexible enough to rapidly incorporate new diseases, including those that impact plants and other animals.

The project will be led by Dr. Peter Nugent, a senior scientist in Berkeley Lab’s Applied Mathematics and Computational Research (AMCR) division, serving as the lead-PI of EMERGE. Working with Dr. Nugent, Dr. Yoon will serve as one of the PIs of this multi-institutional collaborative project, mainly focusing on aspects that involve decision-making based on epidemiological models under uncertainty.

For further details, please see the announcement at: https://newscenter.lbl.gov/2023/09/13/berkeley-lab-national-biopreparedness-and-response-efforts/

Dr. Yoon’s research on continuous super-resolution AI model for climate data featured on SIAM NEWS

Dr. Xihaier Luo (Assistant Computational Scientist, Brookhaven National Laboratory) and Dr. Yoon’s joint work on deep network models for continuous super-resolution for climate data has been featured on the front page of SIAM NEWS on July 17, 2023.

This research proposes an innovative coordinate-based deep learning model to address the challenge of continuous super-resolution for climate data. Specifically, the team has focused on developing an implicit neural network model that is designed to learn continuous representations of climate data.

Further details can be found at the following link:

https://sinews.siam.org/Details-Page/advancing-wind-energy-forecasts-with-continuous-generative-representations

Scientific Reports Collection on Mathematical Oncology is now open for submissions

We are excited to announce that the Collection on Mathematical Oncology in Scientific Reports is now accepting submissions (submission deadline: December 2, 2023)

Mathematical oncology aims to contribute to a better understanding of cancer initiation, progression, and treatment, through the integration and application of mathematical models and simulations. Modelling cancer has become more feasible in recent decades, due to a combination of theoretical and technical advances, ever-expanding computational power, and a growing wealth of both experimental and clinical data. Mathematical models can be used to simulate tumour growth both forwards and backwards in time, and in response to treatments, improve our knowledge of individual cell behaviour, identify key features from complex data sets and clinical imaging, and ultimately develop patient-specific therapies and personalized medicine.

This Collection offers a platform for original contributions on mathematical oncology, from methodological advancements to applied clinical research, including integrative and multidisciplinary studies.

Guest Editors of this Collection will be:

Marek Kimmel: Professor of Statistics at Rice University
Francisco Rodrigues Pinto: Assistant Professor of Biochemistry, University of Lisbon
Byung-Jun Yoon: Associate Professor of Electrical & Computer Engineering at Texas A&M University

Fur further information about the Collection, please visit: https://www.nature.com/collections/eefhgjfege/

[Call for Papers] Special Collection in “Accelerating scientific discoveries through data-driven innovations” (Patterns – a Cell Press journal)

We are happy to announce a new Special Collection in “Accelerating scientific discoveries through data-driven innovations” in Patterns.

Patterns is a premium open-access journal from Cell Press, publishing ground-breaking original research across the full breadth of data science. The journal aims to share data science solutions to problems that cross domain boundaries.

Developing artificial intelligence (AI) and machine learning (ML) methods and models that can accelerate scientific discoveries and advance science has become one of the important research directions for the AI/ML research community. It has been gaining increasing attention from researchers in diverse scientific areas, including biomedical science, materials science, climate science, physics, chemistry, and many others. Data-driven AI/ML innovations to enable reliable predictions and optimal decision-making for scientific discoveries face several critical challenges, among which are high system complexity, large search space, incomplete knowledge, and small data, all of which demand novel strategies to effectively address them. Meeting these changes and thereby accelerating scientific discoveries and industrial innovations calls for research that can take full advantage of the latest advances in AI/ML to integrate data-driven techniques with scientific knowledge and principles and is able to execute them in a modern HPC environment at scale.

We invite researchers working in the forefront of accelerating scientific discoveries through innovations in machine learning (ML), AI, and data-driven modeling to submit their latest research findings to the special collection to inspire the next wave of data-driven innovations in various scientific domains.

Specific topics of interest include, but are not limited to:

Cross-cutting research at the intersections of mathematics, ML/AI, and computing
Scientific ML/AI integrating data-driven techniques with scientific knowledge/principles
Uncertainty-aware techniques of learning, inference, and optimization
Computational challenges for high-performance computing and data at extreme scales
Data-driven discoveries/innovations in complex scientific and industrial problems

We solicit papers that can showcase the state of the art in scientific ML/AI, applied mathematics, and high-performance computing that can meet the challenges that stem from large-scale data, extreme-scale computing requirements, and the need for decision-making pertinent to complex systems in the presence of significant uncertainties, which altogether can accelerate scientific discoveries through data-driven approaches. While we look for submissions of original research articles that report the latest research findings and breakthroughs in the field, we welcome various article types considered by Patterns, which include review and perspective articles.

Manuscripts should be prepared according to the guide for authors and should be submitted online, mentioning in the cover letter that you are submitting for the “Accelerating scientific discoveries through data-driven innovations” special collection.

The article processing charge will be waived for the first five manuscripts to be accepted for publication.

For further information, visit the following link on the Patterns website:

https://www.cell.com/patterns/special-issues/call-for-papers/accelerating-scientific-discoveries-through-data-driven-innovations

Accelerating optimal experimental design (OED) via deep learning

Various real-world scientific applications involve the mathematical modeling of complex uncertain systems with numerous unknown parameters. Accurate parameter estimation is often practically infeasible in such systems, as the available training data may be insufficient and the cost of acquiring additional data may be very high. In such cases, it may be desirable to represent the uncertainty present in the model in a Bayesian paradigm, based on which one may design robust operators that retain good overall performance across all possible models. Furthermore, one may design optimal experiments that can effectively reduce the uncertainty so as to significantly enhance the performance of the robust operators.

While objective-based uncertainty quantification (objective-UQ) based on MOCU (mean objective cost of uncertainty) provides an effective means for quantifying uncertainty in complex systems, the high computational cost of estimating MOCU has been a practical challenge in applying it to real-world scientific/engineering problems.

In our recent work, we proposed a novel deep learning (DL) scheme to reduce the computational cost for objective-UQ via MOCU that can significantly accelerate MOCU estimation and optimal experimental design (OED).

Qihua Chen, Xuejin Chen, Hyun-Myung Woo, Byung-Jun Yoon, “Neural Message Passing for Objective-Based Uncertainty Quantification and Optimal Experimental Design,” Engineering Applications of Artificial Intelligence, Volume 123, Part A, 106171, 2023, https://doi.org/10.1016/j.engappai.2023.106171

In the above study, we trained a message-passing neural network (MPNN) as a surrogate MOCU estimator, incorporating a novel axiomatic constraint loss that improves the estimation performance, and ultimately, the OED outcomes. Our results show that the proposed scheme can accelerate MOCU-based OED by four to five orders of magnitude, without any visible performance loss.

For further details, the paper can be accessed at: [download paper]

The 2023 ACM-BCB conference will be held in Houston, TX, September 3-6, 2023.

This year, the 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB) – the flagship conference of the ACM SIGBio – will be held in Houston, TX, September 3-6, 2023. ACM-BCB is being organized for the fourteenth year, building upon the success of the first thirteen meetings in various locations, including Boston, Niagara Falls, Chicago, Orlando, Washington DC, Newport Beach, Atlanta, and Seattle.

Continue reading →

How to design a high-throughput virtual screening (HTVS) pipeline for efficient selection of redox-active organic materials

As global interest in renewable energy continues to increase, there has been a pressing need for developing novel energy storage devices based on organic electrode materials that can overcome the shortcomings of the current lithium-ion batteries. One critical challenge for this quest is to find materials whose redox potential (RP) meets specific design targets.

In our recent study below, we proposed a computational framework for addressing this challenge:

Hyun-Myung Woo, Omar Allam, Junhe Chen, Seung Soon Jang, Byung- Jun Yoon, “Optimal high-throughput virtual screening pipeline for efficient selection of redox- active organic materials,” iScience (2023), doi: https://doi.org/10.1016/j.isci.2022.105735

Given a high-fidelity model for estimating the RP of a given material, we showed how a set of surrogate models with different accuracy and complexity may be designed to construct a highly accurate and efficient HTVS pipeline. The performance of the screening campaigns based on the constructed HTVS pipeline can be optimized by designing the optimal screening policy, which enables rapid screening of organic materials that satisfy the desired criteria. We demonstrated that the proposed HTVS pipeline construction and operation strategies can substantially enhance the overall screening throughput.

Further details of this study can be found in iScience.

*A graphical abstract illustrating the optimal design and operation of a high-throughput virtual screening (HTVS) pipeline for efficient selection of redox-active organic materials.*

Dr. Yoon’s collaborative work with COVID ACT NOW and Mass General Hospital is featured on the cover of Patterns from Cell Press

The paper “Looking back on forward-looking COVID models”, in which Dr. Byung-Jun Yoon is a co-author, has been featured on the cover of the July 2022 issue of Patterns, a premium open access journal from Cell Press. This paper is an outcome of Dr. Yoon’s ongoing collaboration with COVID ACT NOW (CAN) – the COVID-focused U.S initiative of Act Now Coalition – and the Massachusetts General Hospital (MGH).

*The cover art of Patterns July 2022 issue features Dr. Yoon’s collaborative project with Covid Act Now and Mass General Hospital.*

In this work, we presented the epidemiological model by Covid Act Now (CAN) and evaluated its performance by back-testing against historical data. For comparison, similar analyses were performed for several other COVID models and the obtained results were compared. It was found that all models generally captured the potential magnitude and directionality of the pandemic in the short term. While there are limitations to epidemiological models, understanding these limitations enables these models to be utilized as tools for “data-driven decision-making” in viral outbreaks.

**“Face the Music”** – the cover image on the July 2022 issue of Patterns – was created by Anna Maybach, resident artist at Act Now Coalition.

The cover image is entitled “Face the Music”. The American idiom “face the music” means to accept consequences. It is thought to originate from an exhortation to face one’s stage fright. The sound waves in this image were created by superimposing Covid Act Now trend graphs representing cases, hospitalizations, ICU hospitalizations, and deaths in the United States since March 2020. This visualization represents the path of accepting the consequences of our actions and facing our fears in order to navigate the COVID pandemic: to “face the music.”

The paper can be accessed at the link below:

Paul Chong, Byung-Jun Yoon, Debbie Lai, Michael Carlson, Jarone Lee, Shuhan He, “Looking Back on Forward-Looking COVID Models,” Patterns, volume 3, issue 7, 100492, July 08, 2022, https://doi.org/10.1016/j.patter.2022.100492.

BioMLSP Lab

Machine Learning for Computational Network Biology @ Texas A&M University