Category Archives: Uncategorized

Enhancing generative models for multi-objective molecular design

Molecular design based on generative models, such as variational autoencoders (VAEs), has become increasingly popular in recent years due to its efficiency in exploring high-dimensional molecular space to identify molecules with desired properties. While the efficacy of the initial model strongly depends on the training data, the sampling efficiency of the model for suggesting novel molecules with enhanced properties can be further enhanced via latent space optimization (LSO).

In our recent paper, entitled “Multi-objective latent space optimization of generative molecular design models”, we propose a multi-objective LSO method that can significantly enhance the performance of generative molecular design (GMD). The proposed method adopts an iterative weighted retraining approach, where the respective weights of the molecules in the training data are determined by their Pareto efficiency. We demonstrate that our multi-objective GMD LSO method can significantly improve the performance of GMD for jointly optimizing multiple molecular properties.

The paper has been published in Patterns, a premium open access journal from Cell Press that publishes ground-breaking original research across the full breadth of data science. The paper can be accessed at the following URL:

https://www.cell.com/patterns/fulltext/S2666-3899(24)00184-3

SIAM MDS24 Minisymposium on “Machine Learning Advances in Scientific Computing and Applications”

We are happy to announce the mini-symposium entitled Machine Learning Advances in Scientific Computing and Applications, which will be held at the upcoming SIAM Conference on Mathematics of Data Science (MDS24), October 21 – 25, 2024, Atlanta, GA.

The mini-symposium will highlight important recent advances in ML that have been accelerating scientific computing and its applications across various scientific domains. The mini-symposium is co-organized by Dr. Youngjoon Hong (KAIST) and Dr. Byung-Jun Yoon (Texas A&M and Brookhaven National Lab) and it will feature the following presentations:

  • Neural-network Based Collision Operators for the Boltzmann Equation
    Speaker: Eric C. Cyr (Sandia National Laboratories)
  • Generative Downscaling of PDE Solvers with Physics-guided Diffusion Models
    Speaker: Wuzhe Xu (University of Massachusetts)
  • Signal Processing with Implicit Neural Fields: Editing, Manipulation, and Understanding
    Speaker: Peihao Wang (University of Texas at Austin)
  • Autoequivariant Network Search via Group Decomposition
    Speaker: Lav Varshney (University of Illinois Urbana-Champaign)

For further information about the mini-symposium, please visit the URL below:
https://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=80426

Nafiz Abeer wins third place award in TAMIDS Sea Level Rise Data Science Challenge

Nafiz Abeer, a PhD student in the BioMLSP Lab, won the third-place award in the TAMIDS Sea Level Rise Data Science Challenge. Nafiz teamed up with two other graduate students – Sharmin Majumder (Electrical Engineering) and Musfira Rahman (Civil Engineering) – to analyze and predict US Coastal sea-level rise and its socio-economic impacts.

“The growing impacts of climate change have underscored the need to understand Mean Sea Level (MSL) rise, particularly along the US East and Gulf coasts. The sea level rise dynamics behave differently for the southeast and northeast parts of the US coast. This report presents an analysis of historical tide gauge records from 1900 to 2021 to understand the spatiotemporal dynamics of MSL rise along the northeast and southeast coats and Gulf coasts and its socio-economic implications on coastal communities.”

For more information about the project and the TAMIDS challenge, please visit:

https://tamids.tamu.edu/2024/04/22/tamids-sea-level-rise-data-science-challenge-awards/

ACM-BCB 2024 will take place in Shenzhen, China, Nov. 22-25, 2024

The 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2024) is the flagship conference of the ACM SIGBio. It will be held for the first time outside of USA in Shenzhen China during Nov 22-25, 2024, after past successes in many USA locations. ACM-BCB 2024 aims to promote AI for Bio-medicine (AI4Bio), including cutting-edge AI advances in computational biology, bioinformatics, and health informatics, at the intersection of computer science, artificial intelligence, statistics, biology, and medicine.

For further information, please visit: https://hpcc.siat.ac.cn/acm-bcb2024

[Download ACM-BCB 2024 call for paper]

[Postdoc opening] Apply for a postdoc position in Scientific Computing, Machine Learning, Applied Mathematics

We are happy to announce a new postdoc position in the area of Scientific Computing, AI/ML, and applied mathematics.

The Applied Mathematics Group of the Computational Science Initiative (CSI) at Brookhaven National Laboratory (BNL) invites exceptional candidates to apply for a postdoctoral research associate position in applied mathematics, scientific computing, and machine learning. This position offers a unique opportunity to conduct research in emerging interdisciplinary research problems at the intersection of applied mathematics, machine learning, and high-performance computing (HPC) with applications in diverse scientific domains of interest to BNL and the Department of Energy (DOE). Topics of specific interest include: (i) decision making under uncertainty / optimal design and control of experiments and physical or biological systems; (ii) uncertainty quantification for physical or biological systems or machine learning; (iii) optimization; (iv) model reduction / digital twins / surrogate modeling or emulation; (v) modeling & simulation; (vi) scientific machine learning; (vii) computational and experimental workflows and pipelines for science and engineering problems. The position includes access to world-class HPC resources, such as the BNL Institutional Cluster and DOE leadership computing facilities. Access to these platforms will allow computing at scale and will ensure that the successful candidate will have the necessary resources to solve challenging DOE problems of interest.

This program provides full support for two years at CSI with a possible extension. Candidates must have received a doctorate (Ph.D.) in applied mathematics, statistics, computer science, or a related field (e.g., mathematics, engineering, operations research, physics) within five years of the negotiated start date of the position. This postdoc position presents a unique chance to conduct interdisciplinary collaborative research in BNL programs with a highly competitive salary.

For further details, please visit the URL below:

https://bnl.wd1.myworkdayjobs.com/Externa/job/Upton-NY/Postdoc—Applied-Mathematics—Scientific-Computing_JR101026

The postdoc will be primarily mentored by and work with Dr. Byung-Jun Yoon and will also actively collaborate with other members of the Applied Mathematics Group at BNL.

[Open Position] BNL’s Amalie Emmy Noether Postdoctoral Fellow in Applied Mathematics and Scientific Computing now accepting applications

The Applied Mathematics Group of the Computational Science Initiative (CSI) at Brookhaven National Laboratory (BNL) invites exceptional candidates to apply for the Amalie Emmy Noether Postdoctoral Fellowship in applied mathematics and scientific computing.

This fellowship offers a unique opportunity to conduct research in a broad set of fields, including decision-making under uncertainty, scientific machine learning, reduced order or surrogate modeling, uncertainty quantification and scalable computational statistics, model-form errors, optimization, optimal experimental design, high-dimensional inverse problems, data science for streaming parallel/distributed analytics for high-performance computing (HPC), integrated or compositional computational modeling frameworks, numerical methods, and multiscale modeling and simulation.

For further details, please visit: [Apply for Noether Postdoc Fellow Position]

Advancing Materials Science through uncertainty-aware, physics-informed, and knowledge-driven AI/ML models

Significant acceleration of the future discovery of novel functional materials requires a fundamental shift from the current materials discovery practice, which is heavily dependent on trial-and-error campaigns and high-throughput screening, to one that builds on knowledge-driven advanced informatics techniques enabled by the latest advances in signal processing and machine learning.

In a recent review article published in Patterns, an open-access journal from Cell Press focused on data science, we discuss the major research issues that need to be addressed to expedite this transformation along with the salient challenges involved. We especially focus on Bayesian signal processing and machine learning schemes that are uncertainty-aware and physics-informed for knowledge-driven learning, robust optimization, and efficient objective-driven experimental design.

Illustration of the optimal experimental design (OED) cycle that is enabled by knowledge-driven and objective-based uncertainty quantification via MOCU (mean objective cost of uncertainty)

For further details, please refer to the paper below:

Xiaoning Qian, Byung-Jun Yoon, Raymundo Arróyave, Xiaofeng Qian,
Edward R. Dougherty, “Knowledge-Driven Learning, Optimization, and Experimental Design under Uncertainty for Materials Discovery“, Patterns, volume 4, issue 11, 100863, Nov. 10, 2023. https://doi.org/10.1016/j.patter.2023.100863

Accelerating scientific advances and discoveries through artificial intelligence (AI) and machine learning (ML)

A recent editorial by Dr. Byung-Jun Yoon and co-authors Francis J. Alexander, Meifeng Lin, and Xiaoning Qian published in Patterns from Cell Press discusses how AI/ML, data-driven modeling, and scientific computing can significantly accelerate scientific advances and novel discoveries.

Francis J. Alexander, Meifeng Lin, Xiaoning Qian, and Byung-Jun Yoon, “Accelerating scientific discoveries through data-driven innovations,” Patterns, volume 4, issue 11, 100876, Nov. 10, 2023. https://doi.org/10.1016/j.patter.2023.100876

Developing artificial intelligence (AI) and machine learning (ML) methods that can accelerate scientific discoveries and advance science has become one of the important research directions for the AI/ML research community. It has been gaining increasing attention from researchers in diverse scientific areas, including biomedical science, materials science, climate science, physics, chemistry, and many others. Data-driven AI/ML innovations to enable reliable predictions and optimal decision-making for scientific discoveries face several critical challenges, among which are high system complexity, large search space, incomplete knowledge, and small data, all of which demand novel strategies to effectively address them. Meeting these challenges and thereby accelerating scientific discoveries and industrial innovations, calls for research that can take full advantage of the latest advances in AI/ML to integrate data-driven techniques with scientific knowledge and is able to execute them in modern high-performance computing (HPC) environments at scale.

Those who are interested in learning more about the recent advances in relevant fields are referred to the Patterns Special CollectionAccelerating scientific discoveries through data-driven innovations”, which features articles that showcase the promising roles of AI/ML and data-driven modeling in accelerating scientific discoveries and may inspire the next wave of data-driven innovations in various scientific domains.

Optimizing high-throughput virtual screening campaigns

The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property prediction models make screening practically challenging.

In our recent work published in Patterns, a premium open-access journal from Cell Press, we propose a general framework for constructing and optimizing a high-throughput virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return on computational investment. Based on both simulated and real-world data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate virtual screening without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency.

Optimal decision-making in high-throughput virtual screening pipelines

For further details, please refer to the paper at the link below:
https://www.cell.com/patterns/fulltext/S2666-3899(23)00267-2

Uncertainty-aware framework for quantifying the reproducibility of scientific workflows that involve AI/ML models

The capability to replicate the predictions by machine learning (ML) or artificial intelligence (AI) models and the results in scientific workflows that incorporate such ML/AI predictions is driven by a variety of factors. An uncertainty-aware metric that can quantitatively assess the reproducibility of the quantities of interest (QoI) would contribute to the trustworthiness of the results obtained from scientific workflows involving ML/AI models.

In a recent perspective article published in Digital Discovery – from the Royal Society of Chemistry (RSC) – we discuss how uncertainty quantification (UQ) in a Bayesian paradigm can provide a general and rigorous framework for quantifying the reproducibility of complex scientific workflows. Such frameworks have the potential to fill a critical gap that currently exists in ML/AI for scientific workflows, as they will enable researchers to determine the impact of ML/AI model prediction variability on the predictive outcomes of ML/AI-powered workflows. We expect that the envisioned framework will contribute to the design of more reproducible and trustworthy workflows for diverse scientific applications, and ultimately, accelerate scientific discoveries in the future.

For more information, access the full article at the link below:

https://pubs.rsc.org/en/content/articlelanding/2023/DD/D3DD00094J