Next-generation particle physics experiments, including those conducted in heavy-ion and electron-ion colliders, aim to recreate the extreme conditions of the early universe and to probe how matter behaves at the smallest scales. Their detectors, such as large time projection chambers (TPCs), capture three-dimensional particle tracks from every collision, producing enormous data volumes at high speed. These rich datasets contain the detailed structures scientists need to study nuclear matter, map the internal structure of protons and nuclei, and search for physics beyond the standard model. Yet, the sheer scale of the data presents a growing challenge: storing all events in full is increasingly impractical, while conventional data-reduction strategies—such as discarding events based on predefined triggers or applying generic compression—may remove rare or unexpected signals that are central to discovery. Complicating matters further, TPC data are highly sparse: only a tiny portion of the detector records activity, but the specific pattern of those signals is essential for scientific interpretation.
In our recent paper entitled “Variable rate neural compression for sparse detector data”, published in Patterns – an open access journal by Cell Press in the broad area of Data Science – we explore a data-driven approach to compression that learns directly from the structure of the data themselves.
Yi Huang, Yeonju Go, Jin Huang, Shuhang Li, Xihaier Luo, Thomas Marshall, Joseph Osborn, Christopher Pinkenburg, Yihui Ren, Evgeny Shulga, Shinjae Yoo, Byung-Jun Yoon, “Variable Rate Neural Compression for Sparse Detector Data,” Patterns, 2026, 101452, https://doi.org/10.1016/j.patter.2025.101452.
Instead of relying on physics-specific rules, we propose a method that identifies and prioritizes the most informative signals within each event, allowing storage to scale with the true complexity of the measurement. Such adaptive, learning-based strategies could help future experiments preserve far more information without exceeding storage or computing limits, and they may offer a general framework for handling large, sparse datasets across a wide range of scientific domains.
For further details, please visit the Patterns website to read the whole paper: https://www.cell.com/patterns/fulltext/S2666-3899(25)00300-9