Author Archives: Byung-Jun Yoon

About Byung-Jun Yoon

I am a Professor at Texas A&M University, Department of Electrical & Computer Engineering, and a Scientist at Brookhaven National Laboratory, Computational Science Initiative. My research interests include objective-based uncertainty quantification, optimal experimental design (OED), machine learning, and signal processing. Application areas of interest include bioinformatics, computational network biology, and AI-driven drug/materials discovery.

Turning Uncertainty into a Design Tool for AI-engineered Molecules

Our recent work on harnessing model uncertainty to improve generative molecular design has been featured on the Brookhaven National Laboratory’s website. The full article can be accessed at: https://www.bnl.gov/newsroom/news.php?a=222882

This was a collaborative work between Texas A&M and BNL, involving Nafiz Abeer (TAMU), Sanket Jantre (BNL), and Nathan Urban (BNL).

H/T to Charity Plata (BNL CDS) for the wonderful article!

Accelerating AI-driven antigen design

Last week, researchers at Texas A&M and UTMB, who are part of the ARPA-H APECx “SPHERICAL” project team, convened for an intensive (but fun!) 1.5-day workshop focused on accelerating the progress in AI-driven antigen design.

At the heart of the discussions was the AI-based antigen design pipeline developed by the Texas A&M team, which has been applied to optimize the design of candidate antigens for Mayaro virus (MAYV) and Rift Valley fever virus (RVFV). This represents an important step toward rapid, data-driven countermeasure development that effectively leverages the latest advances in AI, protein language modeling, and protein structure prediction.

A major milestone highlighted during the workshop was the availability of the first batch of experimental results. UTMB successfully synthesized the antigens designed by the AI pipeline and evaluated them using neutralization assays, providing critical validation data for the computational predictions.

Our team engaged in in-depth discussions analyzing the correlation between the predicted antigen properties and the observed neutralization performance. We expect that these insights will be instrumental in refining the current antigen design pipeline, improving the predictive accuracy of the AI models utilized therein, and strengthening the overall design framework in a closed-loop fashion.

The workshop concluded with forward-looking conversations between team members at the two institutions on enhancing the pipeline, including strategies to improve robustness, incorporate uncertainty quantification, and extend its capabilities toward predicting broadly effective antigens against diverse viral variants.

This collaborative effort underscores the power of integrating AI, experimental validation, and interdisciplinary expertise to accelerate the development of next-generation antigen design strategies.

For more information about the ARPA-H APECx program that is sponsoring the project, please visit: https://arpa-h.gov/explore-funding/programs/apecx

Uncertainty-aware optimization of generative molecular design models featured on the cover of RSC Journal MSDE’s Feb. 2026 issue

How can we explore model uncertainty to improve generative molecular design?

In our recent paper entitled “Enhancing generative molecular design via uncertainty-guided fine-tuning of variational autoencoders” just featured on the cover page of RSC Molecular Systems Design & Engineering (MSDE)‘s Feb. 2026 issue, we propose an uncertainty-aware model optimization strategy.

Specifically, our strategy takes the latent points found by any latent space optimization approach and explores the uncertainty classes of generative molecular design models (in this case, variational autoencoders) through their low-dimensional active subspace to find the models that improve the properties of the molecules corresponding to those latent points.

Further details of our work can be found at: https://pubs.rsc.org/en/content/articlelanding/2025/me/d5me00081e

ACM-BCB 2026 will be held in Calabria, Italy (June 30-July 3, 2026)

The 17th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2026) is the flagship conference of the ACM SIGBio. It will move for the first time to Europe and will be held in Calabria, Italy, from June 30 to July 3, 2026. The conference aims to promote big data, AI, and algorithms for health and biomedicine, including cutting-edge advances in computational biology, bioinformatics, and health informatics, at the intersection of computer science, biology, health, and medicine. The conference will be hosted at the University of Calabria Congress Center.

This year, for the first time, ACM BCB submitted papers will go through a two-phase review process, aiming to publish accepted papers in a new ACM journal format. Through the two-phase review process, authors can refine their work based on constructive critiques from reviewers. The updated paper version can thus be considered in a second-phase review process. Accepted papers will then be included in a new journal published by ACM.

For further information about ACM-BCB 2026 and important dates, please visit the conference website: https://acm-bcb.org

Variable rate neural compression of massive particle collider data from next-generation particle physics experiments

Next-generation particle physics experiments, including those conducted in heavy-ion and electron-ion colliders, aim to recreate the extreme conditions of the early universe and to probe how matter behaves at the smallest scales. Their detectors, such as large time projection chambers (TPCs), capture three-dimensional particle tracks from every collision, producing enormous data volumes at high speed. These rich datasets contain the detailed structures scientists need to study nuclear matter, map the internal structure of protons and nuclei, and search for physics beyond the standard model. Yet, the sheer scale of the data presents a growing challenge: storing all events in full is increasingly impractical, while conventional data-reduction strategies—such as discarding events based on predefined triggers or applying generic compression—may remove rare or unexpected signals that are central to discovery. Complicating matters further, TPC data are highly sparse: only a tiny portion of the detector records activity, but the specific pattern of those signals is essential for scientific interpretation.

In our recent paper entitled “Variable rate neural compression for sparse detector data”, published in Patterns – an open access journal by Cell Press in the broad area of Data Science – we explore a data-driven approach to compression that learns directly from the structure of the data themselves.

Yi Huang, Yeonju Go, Jin Huang, Shuhang Li, Xihaier Luo, Thomas Marshall, Joseph Osborn, Christopher Pinkenburg, Yihui Ren, Evgeny Shulga, Shinjae Yoo, Byung-Jun Yoon, “Variable Rate Neural Compression for Sparse Detector Data,” Patterns, 2026, 101452, https://doi.org/10.1016/j.patter.2025.101452.

Instead of relying on physics-specific rules, we propose a method that identifies and prioritizes the most informative signals within each event, allowing storage to scale with the true complexity of the measurement. Such adaptive, learning-based strategies could help future experiments preserve far more information without exceeding storage or computing limits, and they may offer a general framework for handling large, sparse datasets across a wide range of scientific domains.

For further details, please visit the Patterns website to read the whole paper: https://www.cell.com/patterns/fulltext/S2666-3899(25)00300-9

How to explore model uncertainty to enhance generative molecular design

In recent years, artificial intelligence (AI) has made big strides in helping scientists design new molecules—whether for life-saving drugs or advanced materials. Especially, generative AI models have received significant interest from the research community for designing new molecules, opening exciting possibilities in science and medicine.

However, one big challenge remains: these models are usually trained once and then used “as-is.” Adapting them to target-specific molecular characteristics – for example, to improve a molecule’s activity against specific targets or to make the designed molecule easily synthesizable – is challenging.

In a recent publication entitled “Enhancing generative molecular design via uncertainty-guided fine-tuning of variational autoencoders,” published in RSC Molecular Systems Design & Engineering, we have proposed a smarter way to fine-tune generative AI models by quantifying and leveraging model uncertainty.

A N M Nafiz Abeer, Sanket Jantre, Nathan M Urban, and Byung-Jun Yoon, “Enhancing Generative Molecular Design via Uncertainty-guided Fine-tuning of Variational Autoencoders,” Molecular Systems Design & Engineering, 2025, https://doi.org/10.1039/D5ME00081E.

In this paper, we proposed an uncertainty-guided fine-tuning strategy that can effectively enhance a pre-trained variational autoencoder (VAE) for generative molecular design (GMD) through performance feedback in an active learning setting. The strategy begins by quantifying the model uncertainty of the generative model using an efficient active subspace-based UQ (uncertainty quantification) scheme. Next, the decoder diversity within the characterized model uncertainty class is explored to expand the viable space of molecular generation. Empirical results across six target molecular properties demonstrate that the uncertainty-guided fine-tuning strategy consistently leads to improved models that outperform the original pre-trained generative models.

For further details, please read the full paper at the following link: https://doi.org/10.1039/D5ME00081E

Nafiz Abeer wins Best Poster Award at the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025)

Nafiz Abeer, a senior Ph.D student in the BioMLSP lab, has recently presented his work on TCR-pMHC binding prediction at the 16th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2025), the flagship conference of ACM SIGBio. His poster presentation has been selected by the ACM-BCB 2025 Poster Chairs for the Best Poster Award.

Understanding the interaction between the T-cell receptor (TCR) and the peptide-major histocompatibility complex (pMHC) is a crucial step in T-cell activation and the initiation of the T-cell mediated adaptive immune response. Despite efforts in developing various computational approaches for TCR-pMHC binding prediction, identifying TCR recognition for new peptides forming pMHCs on the surface of antigen-presenting cells (APCs) remains a critical challenge. In his work, Nafiz proposed a novel dual encoder strategy to effectively leverage predicted TCR-pMHC complex structures, thereby achieving better prediction performance for peptides that were not observed during training. His findings demonstrate that the dual-encoder-based approach often improves prediction performance (especially for EGNN) over existing single-encoder-based schemes.

ARPA-H APECx program site visit to Texas A&M University

On June 4th, 2025, Dr. Andy Kilianski, Acting Deputy Director of Health Science Futures at the Advanced Research Projects Agency for Health (ARPA-H), visited Texas A&M University to meet with the team behind the SPHERICAL project.

SPHERICAL (Scientific Platform for High Efficacy Antigen Design via Robust Integration of Computational Experiments, AI, and Protein Modeling) is supported by ARPA-H’s APECx program. Over the past nine months, the entire SPHERICAL team and I have actively reimagined how AI-driven, uncertainty-quantified, and competence-aware modeling can accelerate the design of broadly effective antigens.

During the site visit, we had the pleasure of presenting our progress to Dr. Andy Kilianski, discussing next steps, and sharing our vision for advancing the APECx mission. We look forward to continuing this collaboration and driving the project toward its milestones!

Multi-objective prompt optimization

Natural language prompt optimization, or prompt engineering, has emerged as a
powerful technique to unlock the potential of Large Language Models (LLMs) for various tasks. While existing methods primarily focus on maximizing a single task-specific performance metric for LLM outputs, real-world applications
often require considering trade-offs between multiple objectives.

In a recent paper entitled “Pareto Prompt Optimization“, we proposed an effective technique for multi-objective prompt optimization for LLMs to address this limitation. Specifically, the proposed method called ParetoPrompt, takes a reinforcement learning (RL) approach that leverages dominance relationships between prompts to derive an effective policy model for prompt optimization using preference-based loss functions. By leveraging multi-objective dominance relationships, ParetoPrompt enables efficient exploration of the entire Pareto front without the need for a predefined (and typically heuristic) scalarization of multiple objectives. Experimental results show that ParetoPrompt consistently outperforms existing prompt optimization techniques on various benchmarks. Furthermore, ParetoPrompt demonstrates robust performance when the objective metrics differ between training and testing.

Details of the work can be found at the following link:

Guang Zhao, Byung-Jun Yoon, Gilchan Park, Shantenu Jha, Shinjae Yoo, Xiaoning Qian, “Pareto Prompt Optimization,” 13th International Conference on Learning Representations (ICLR), Singapore, Apr 24-28, 2025.

Transforming antigen design for future vaccines with broad viral efficacy

In a recent article, “The Next Frontier for Vaccines: Using AI to Predict Antigens”, Texas A&M has turned a spotlight on Dr. Byung-Jun Yoon’s new research project sponsored by the ARPA-H (Advanced Research Projects Agency for Health) through its ambitious APECx (Antigens Predicted for Broad Viral Efficacy through Computational Experimentation) program.

The five-year up to $11 million project entitled “SPHERICAL: Scientific Platform for High Efficacy Antigen Design via Robust Integration of Computational Experiments, Artificial Intelligence (AI), and Protein Modeling” and led by Dr. Yoon focuses on computational prediction of antigens that could be used for creating future vaccines that are broadly effective against multiple strains of viruses in the same family.

In this project, Dr. Yoon will work with Dr. Xiaoning Qian (TAMU ECE) and Dr. Shuiwang Ji (TAMU CSE) as well as a multi-institutional team of experts at Princeton; University of Texas Medical Branch (UTMB); the University of Missouri; Stony Brook University; University of Tennessee, Knoxville; and Argonne National Laboratory.

“What if we can actually generate vaccines and predict antigens that can be used for developing such vaccines that have broad efficacy for multiple strains in the same viral family? Even though it may not be optimal for a specific strain, it’ll retain its robustness and efficacy for broad viral strains in a given family, keeping the vaccine more effective for a longer period for a larger population.” – Dr. Byung-Jun Yoon

To read the full article, please visit: tx.ag/TheNextFrontier

BioMLSP Lab

Machine Learning for Computational Network Biology @ Texas A&M University