Category Archives: Uncategorized

How to explore model uncertainty to enhance generative molecular design

In recent years, artificial intelligence (AI) has made big strides in helping scientists design new molecules—whether for life-saving drugs or advanced materials. Especially, generative AI models have received significant interest from the research community for designing new molecules, opening exciting possibilities in science and medicine.

However, one big challenge remains: these models are usually trained once and then used “as-is.” Adapting them to target-specific molecular characteristics – for example, to improve a molecule’s activity against specific targets or to make the designed molecule easily synthesizable – is challenging.

In a recent publication entitled “Enhancing generative molecular design via uncertainty-guided fine-tuning of variational autoencoders,” published in RSC Molecular Systems Design & Engineering, we have proposed a smarter way to fine-tune generative AI models by quantifying and leveraging model uncertainty.

A N M Nafiz Abeer, Sanket Jantre, Nathan M Urban, and Byung-Jun Yoon, “Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational Autoencoders,” Molecular Systems Design & Engineering, 2025, https://doi.org/10.1039/D5ME00081E.

In this paper, we proposed an uncertainty-guided fine-tuning strategy that can effectively enhance a pre-trained variational autoencoder (VAE) for generative molecular design (GMD) through performance feedback in an active learning setting. The strategy begins by quantifying the model uncertainty of the generative model using an efficient active subspace-based UQ (uncertainty quantification) scheme. Next, the decoder diversity within the characterized model uncertainty class is explored to expand the viable space of molecular generation. Empirical results across six target molecular properties demonstrate that the uncertainty-guided fine-tuning strategy consistently leads to improved models that outperform the original pre-trained generative models.

For further details, please read the full paper at the following link: https://doi.org/10.1039/D5ME00081E

Nafiz Abeer wins Best Poster Award at the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025)

Nafiz Abeer, a senior Ph.D student in the BioMLSP lab, has recently presented his work on TCR-pMHC binding prediction at the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025), the flagship conference of ACM SIGBio. His poster presentation has been selected by the ACM-BCB 2025 Poster Chairs for the Best Poster Award.

Understanding the interaction between the T-cell receptor (TCR) and the peptide-major histocompatibility complex (pMHC) is a crucial step in T-cell activation and the initiation of the T-cell mediated adaptive immune response. Despite efforts in developing various computational approaches for TCR-pMHC binding prediction, identifying TCR recognition for new peptides forming pMHCs on the surface of antigen-presenting cells (APCs) remains a critical challenge. In his work, Nafiz proposed a novel dual encoder strategy to effectively leverage predicted TCR-pMHC complex structures, thereby achieving better prediction performance for peptides that were not observed during training. His findings demonstrate that the dual-encoder-based approach often improves prediction performance (especially for EGNN) over existing single-encoder-based schemes.

ARPA-H APECx program site visit to Texas A&M University

On June 4th, 2025, Dr. Andy Kilianski, Acting Deputy Director of Health Science Futures at the Advanced Research Projects Agency for Health (ARPA-H), visited Texas A&M University to meet with the team behind the SPHERICAL project.

SPHERICAL (Scientific Platform for High Efficacy Antigen Design via Robust Integration of Computational Experiments, AI, and Protein Modeling) is supported by ARPA-H’s APECx program. Over the past nine months, the entire SPHERICAL team and I have actively reimagined how AI-driven, uncertainty-quantified, and competence-aware modeling can accelerate the design of broadly effective antigens.

During the site visit, we had the pleasure of presenting our progress to Dr. Andy Kilianski, discussing next steps, and sharing our vision for advancing the APECx mission. We look forward to continuing this collaboration and driving the project toward its milestones!

Multi-objective prompt optimization

Natural language prompt optimization, or prompt engineering, has emerged as a
powerful technique to unlock the potential of Large Language Models (LLMs) for various tasks. While existing methods primarily focus on maximizing a single task-specific performance metric for LLM outputs, real-world applications
often require considering trade-offs between multiple objectives.

In a recent paper entitled “Pareto Prompt Optimization“, we proposed an effective technique for multi-objective prompt optimization for LLMs to address this limitation. Specifically, the proposed method called ParetoPrompt, takes a reinforcement learning (RL) approach that leverages dominance relationships between prompts to derive an effective policy model for prompt optimization using preference-based loss functions. By leveraging multi-objective dominance relationships, ParetoPrompt enables efficient exploration of the entire Pareto front without the need for a predefined (and typically heuristic) scalarization of multiple objectives. Experimental results show that ParetoPrompt consistently outperforms existing prompt optimization techniques on various benchmarks. Furthermore, ParetoPrompt demonstrates robust performance when the objective metrics differ between training and testing.

Details of the work can be found at the following link:

Guang Zhao, Byung-Jun Yoon, Gilchan Park, Shantenu Jha, Shinjae Yoo, Xiaoning Qian, “Pareto Prompt Optimization,” 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24-28, 2025.

Transforming antigen design for future vaccines with broad viral efficacy

In a recent article, “The Next Frontier for Vaccines: Using AI to Predict Antigens”, Texas A&M has turned a spotlight on Dr. Byung-Jun Yoon’s new research project sponsored by the ARPA-H (Advanced Research Projects Agency for Health) through its ambitious APECx (Antigens Predicted for Broad Viral Efficacy through Computational Experimentation) program.

The five-year up to $11 million project entitled “SPHERICAL: Scientific Platform for High Efficacy Antigen Design via Robust Integration of Computational Experiments, Artificial Intelligence (AI), and Protein Modeling” and led by Dr. Yoon focuses on computational prediction of antigens that could be used for creating future vaccines that are broadly effective against multiple strains of viruses in the same family.

In this project, Dr. Yoon will work with Dr. Xiaoning Qian (TAMU ECE) and Dr. Shuiwang Ji (TAMU CSE) as well as a multi-institutional team of experts at Princeton; University of Texas Medical Branch (UTMB); the University of Missouri; Stony Brook University; University of Tennessee, Knoxville; and Argonne National Laboratory.

“What if we can actually generate vaccines and predict antigens that can be used for developing such vaccines that have broad efficacy for multiple strains in the same viral family? Even though it may not be optimal for a specific strain, it’ll retain its robustness and efficacy for broad viral strains in a given family, keeping the vaccine more effective for a longer period for a larger population.” – Dr. Byung-Jun Yoon

To read the full article, please visit: tx.ag/TheNextFrontier

Texas A&M team led by Dr. Yoon receives an ARPA-H award to spearhead the development of a computational platform to design future vaccines

On September 25, 2024, the Advanced Research Projects Agency for Health (ARPA-H) has announced teams to receive awards from its Antigens Predicted for Broad Viral Efficacy through Computational Experimentation (APECx) program.

The team led by the Texas A&M Engineering Experiment Station (Lead-PI: Byung-Jun Yoon) is one of the awardees, where the team will focus on data and tool integration across the APECx program. The goal of Dr. Yoon and his team is to create a protein design pipeline that will be optimized to immunogen development for all purposes, including vaccines and therapeutics against bacterial pathogens, cancer, and other health threats.

For ARPA-H’s full announcement, please refer to the article in the the following link:

https://arpa-h.gov/news-and-events/arpa-h-announces-awards-develop-computational-platform-multi-virus-vaccine-design

New postdoc positions available in AI/ML and Scientific Computing for vaccine design

We are excited to announce multiple postdoc positions in the broad areas of AI/ML, scientific computing, and bioinformatics/computational biology!

The Yoon & Qian Labs in the Department of Electrical and Computer Engineering at Texas A&M University (College Station, TX) invite outstanding candidates to apply for a post-doctoral researcher position in AI/ML and Scientific Computing for the development of Future Vaccines. Essential duties and responsibilities will involve the development of AI/ML-enabled scientific workflows for effective and computationally efficient antigen prediction/design and the construction of an AI/ML-ready scientific data repository.

Topics of interest include: (i) generative AI, (ii) protein language models, (iii) protein structure prediction, (iv) active learning & optimal experimental design (OED), (v) multi-modal AI/ML models. The postdocs will be mentored by both Drs. Yoon and Qian and closely collaborate with other researchers at Texas A&M and other partnering institutions.

Multiple postdoc positions are currently available. Well-qualified individuals interested in the post-doc position should apply by email as follows:

  • Submit the following documents: (i) cover letter, (ii) CV, (iii) 2~3 major publications.
  • Send email to: Dr. Yoon bjyoon@tamu.edu and Dr. Qian xqian@ece.tamu.edu
  • With subject: [Postdoc application] First-name Last-name, current affiliation

For further information, please see the [postdoc opening] flyer below:

Enhancing generative models for multi-objective molecular design

Molecular design based on generative models, such as variational autoencoders (VAEs), has become increasingly popular in recent years due to its efficiency in exploring high-dimensional molecular space to identify molecules with desired properties. While the efficacy of the initial model strongly depends on the training data, the sampling efficiency of the model for suggesting novel molecules with enhanced properties can be further enhanced via latent space optimization (LSO).

In our recent paper, entitled “Multi-objective latent space optimization of generative molecular design models”, we propose a multi-objective LSO method that can significantly enhance the performance of generative molecular design (GMD). The proposed method adopts an iterative weighted retraining approach, where the respective weights of the molecules in the training data are determined by their Pareto efficiency. We demonstrate that our multi-objective GMD LSO method can significantly improve the performance of GMD for jointly optimizing multiple molecular properties.

The paper has been published in Patterns, a premium open access journal from Cell Press that publishes ground-breaking original research across the full breadth of data science. The paper can be accessed at the following URL:

https://www.cell.com/patterns/fulltext/S2666-3899(24)00184-3

SIAM MDS24 Minisymposium on “Machine Learning Advances in Scientific Computing and Applications”

We are happy to announce the mini-symposium entitled Machine Learning Advances in Scientific Computing and Applications, which will be held at the upcoming SIAM Conference on Mathematics of Data Science (MDS24), October 21 – 25, 2024, Atlanta, GA.

The mini-symposium will highlight important recent advances in ML that have been accelerating scientific computing and its applications across various scientific domains. The mini-symposium is co-organized by Dr. Youngjoon Hong (KAIST) and Dr. Byung-Jun Yoon (Texas A&M and Brookhaven National Lab) and it will feature the following presentations:

  • Neural-network Based Collision Operators for the Boltzmann Equation
    Speaker: Eric C. Cyr (Sandia National Laboratories)
  • Generative Downscaling of PDE Solvers with Physics-guided Diffusion Models
    Speaker: Wuzhe Xu (University of Massachusetts)
  • Signal Processing with Implicit Neural Fields: Editing, Manipulation, and Understanding
    Speaker: Peihao Wang (University of Texas at Austin)
  • Autoequivariant Network Search via Group Decomposition
    Speaker: Lav Varshney (University of Illinois Urbana-Champaign)

For further information about the mini-symposium, please visit the URL below:
https://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=80426

Nafiz Abeer wins third place award in TAMIDS Sea Level Rise Data Science Challenge

Nafiz Abeer, a PhD student in the BioMLSP Lab, won the third-place award in the TAMIDS Sea Level Rise Data Science Challenge. Nafiz teamed up with two other graduate students – Sharmin Majumder (Electrical Engineering) and Musfira Rahman (Civil Engineering) – to analyze and predict US Coastal sea-level rise and its socio-economic impacts.

“The growing impacts of climate change have underscored the need to understand Mean Sea Level (MSL) rise, particularly along the US East and Gulf coasts. The sea level rise dynamics behave differently for the southeast and northeast parts of the US coast. This report presents an analysis of historical tide gauge records from 1900 to 2021 to understand the spatiotemporal dynamics of MSL rise along the northeast and southeast coats and Gulf coasts and its socio-economic implications on coastal communities.”

For more information about the project and the TAMIDS challenge, please visit:

https://tamids.tamu.edu/2024/04/22/tamids-sea-level-rise-data-science-challenge-awards/