Mathematical Data Science

Mathematical Data Sciencehttp://www-math.umd.edu/research/seminars.html Randomization methods for big-data http://www-math.umd.edu/research/seminars.html Mon, 08 Sep 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Stephen Becker (University of Colorado Boulder) - https://stephenbeckr.github.io/
Abstract: In this era of big-data, we must adapt our algorithms to handle large datasets. One obvious issue is that the number of floating-point operations (flops) increases as the input size increases, but there are many less obvious issues as well, such as the increased communication cost of moving data between different levels of computer memory. Randomization is increasingly being used to alleviate some of these issues, as those familiar with random mini-batch sampling in machine learning are well aware of. This talk goes into some specific examples of using randomization to improve algorithms. We focus on special classes of structured random dimensionality reduction, including the count sketch, tensorSketch, Kronecker fast Johnson-Lindenstrauss sketch, and pre-conditioned sampling. These randomized techniques can then be applied to speeding up the classical Lloyd's algorithm for K-means and for computing tensor decompositions, for example. If time permits, we will also show extensions to optimization, including a gradient-free method that uses random finite differences and a method for solving semi-definite programs in an optimal low-memory fashion.
]]> Towards Real-Time Probabilistic SciML algorithms for Digital Twins http://www-math.umd.edu/research/seminars.html Mon, 22 Sep 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Tan Bui-Thanh (The University of Texas at Austin) - https://users.oden.utexas.edu/~tanbui/
Abstract: Digital twins/models (DTs) are designed to be replicas of systems and processes. At the core of a digital twin (DT) are physical/mathematical models that capture the behavior of the real system across temporal and spatial scales. One of the key roles of DTs is enabling “what if” scenario testing of hypothetical simulations to understand the implications at any point throughout the life cycle of the process, to monitor the process, to calibrate parameters to match the actual process and to quantify the uncertainties. In this talk, we will present various real-time Scientific Deep Learning (SciDL) approaches for forward, inverse/calibration, and UQ problems. Both theoretical and numerical results for various problems including transport, heat, Burgers, (transonic and hypersonic) Euler, and Navier-Stokes equations will be presented.
]]> How much can we learn from quantum random circuit sampling? http://www-math.umd.edu/research/seminars.html Mon, 06 Oct 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Tudor Manole (MIT) - https://tmanole.github.io/
Abstract: The last two decades have witnessed quantum computing technologies increasingly move from theoretical proposals to functioning experimental platforms. A key obstacle to building reliable quantum devices is the presence of hardware errors, which must be identified and quantified before they can be corrected. In this work, we study the problem of error estimation from a statistical lens. Building upon a technique introduced by Google Quantum AI known as random circuit sampling (RCS), we develop high-dimensional statistical methodologies for estimating the location and nature of errors in randomized quantum circuits. We also study the information-theoretic limits of this problem, showing that our methods are minimax optimal, and establishing phase diagrams which sharply characterize the statistical trade-off between system size, sample size, and number of errors. From a statistical lens, the model governing this problem is a high-dimensional hierarchical model—where randomness arises both from circuit randomization and from sampling—and we discuss its connections with other latent variable models such as mixture models and topic models.

This talk is based on joint work with Daniel Mark, Wenjie Gong, Bingtian Ye, Yury Polyanskiy, and Soonwon Choi.
]]> Learning and Inference in Mean-Field Games http://www-math.umd.edu/research/seminars.html Mon, 20 Oct 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Jiajia Yu (Duke University) - https://jiajia-yu.github.io/
Abstract: Mean-Field Games (MFGs) study the Nash equilibrium of non-cooperative games involving a continuum of players. They have broad applications and deep connections to areas such as sampling, optimal transport and economics, etc. In this talk, I will present our recent works in both forward and inverse problems in MFGs, with insights gained through numerical analysis and computational methods.
In this talk, I will begin by presenting a convergence analysis of a learning algorithm for MFGs. Our results highlight the central role of the best response in understanding both the game dynamics and the algorithm behavior. Then, I will present a simulation-free method to generalize the algorithm to high-dimension with application to generative models. In the end, I will introduce a simple and efficient iterative strategy for solving a class of inverse MFG problems. This approach shows that measurements of the Nash equilibrium state can be remarkably effective in inferring unknown ambient potentials, such as obstacles.

]]> The Hidden Width of Deep ResNets http://www-math.umd.edu/research/seminars.html Mon, 27 Oct 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Lénaïc Chizat (EPFL) - https://lchizat.github.io/
Abstract: We present a mathematical framework to analyze the training dynamics of deep ResNets that rigorously captures practical architectures (including Transformers) trained from standard random initializations. Our approach combines stochastic approximation of ODEs with propagation-of-chaos arguments. It yields three main insights:- Depth begets width: infinite-depth ResNets of any hidden width behave throughout training as if they were infinitely wide;- Unified phase diagram: the phase diagram of Transformers mirrors that of two-layer perceptrons, once the appropriate substitutions are made;- Optimal shape scaling: for a given parameter budget P, a Transformer with optimal shape converges to its limiting dynamics at rate P^{-1/6}.
]]> Estimation of 1d structures from empirical data http://www-math.umd.edu/research/seminars.html Mon, 03 Nov 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Andrew Warren (University of British Columbia) - andrew-warren.github.io
Abstract: Given a data distribution which is concentrated around a one-dimensional structure, can we infer that structure? We consider versions of this problem where the distribution resides in a metric space and the 1d structure is assumed to either be the range of an absolutely continuous curve, or is a connected set of finite 1d Hausdorff measure. In each of these cases, we relate the inference task to solving a variational problem where there is a tradeoff between data fidelity and simplicity of the inferred structure; the variational problems we consider are closely related to the so-called "principal curve" problem of Hastie and Steutzle as well as the "average-distance problem" of Buttazzo, Oudet, and Stepanov. For each of the variational problems under consideration, we establish: existence of minimizers, stability with respect to the data distribution, and consistency of a discretization scheme which is amenable to Lloyd-type numerical methods. Lastly, we consider applications to estimation of stochastic processes from partial observation, as well as the lineage tracing problem from mathematical biology.

This talk includes joint work with Anton Afanassiev, Young-Heon Kim, Forest Kobayashi, and Geoff Schiebinger.
]]> Generative Artificial Intelligence for Uncertainty Quantification http://www-math.umd.edu/research/seminars.html Mon, 10 Nov 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Guannan Zhang (Oak Ridge National Lab) - https://sites.google.com/view/guannan-zhang/home
Abstract: Generative artificial intelligence models—including variational autoencoders, normalizing flows, generative adversarial networks, and diffusion models—have dramatically advanced the realism and quality of generated images, text, and audio. Beyond these tasks, generative models hold great promise as powerful tools for probability density estimation and high-dimensional sampling, which are central to uncertainty quantification (UQ) tasks such as amortized Bayesian inference and data assimilation. However, while research on image synthesis emphasizes producing high-quality individual samples, UQ applications require accurate approximation of statistical quantities of interest rather than visually realistic samples. As a result, direct application of existing generative models to UQ problems can lead to biased approximations or unstable training. In this talk, we will introduce several new generative approaches tailored to UQ. These include training-free diffusion models for density estimation, a score-based nonlinear filter for data assimilation, and training-free conditional diffusion models for amortized Bayesian inference. We will demonstrate their effectiveness across a range of tasks, including density estimation for unimodal and multimodal distributions, learning stochastic dynamical systems, parameter estimation via amortized inference, and scalable data assimilation for atmospheric models.

Bio: Dr. Guannan Zhang is a Distinguished Staff Scientist in Computer Science and Mathematics Division at Oak Ridge National Laboratory (ORNL). He earned my Ph.D. in applied mathematics at Florida State University in 2012. He joined ORNL in 2012 as the Householder fellow. He received the DOE Early Career Award in 2022. He has been holding a joint faculty appointment with the Department of Mathematics and Statistics at Auburn University since 2014, and a joint faculty appointment with Department of Mathematics at University of Tennessee since 2022. Guannan's research interests include high-dimensional approximation, uncertainty quantification, machine learning and artificial intelligence, stochastic methods for scientific inverse problems.
]]> Scaling Scientific Machine Learning at Both Training and Inference http://www-math.umd.edu/research/seminars.html Mon, 17 Nov 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yiping Lu (Northwestern University) - https://2prime.github.io/
Abstract: Can we find a predictable, polynomial improvement of a scientific machine learning system‘s performance, measured by its test error, as it scales with increases in model(surrogate) size, dataset(collocation) size, and computational resources? In this talk, I’ll talk about why we would expect an optimal scaling law and how to achieve a numerical scaling law. We first present several information optimally results of scaling scientific machine learning and demonstrate the hardness of scaling that hides in optimization. This inspires us to design an optimizer whose hardness is independent of the neural network. Besides, we also show another scaling law via allocating computation at inference time. We build a two-scale Monte Carlo algorithm to debias the ML at inference time that dynamically refines and debiases the SCiML predictions during inference by enforcing the physical laws. We also show how this methodology can be used for diffusion model and language model inference time scaling.
]]> Optimal PhiBE: A Model-Free PDE-Based Framework for Continuous-Time Reinforcement Learning http://www-math.umd.edu/research/seminars.html Mon, 01 Dec 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yuhua Zhu (UCLA) - https://www.yuhuazhu.org/
Abstract: This talk addresses continuous-time reinforcement learning (RL) in settings where the system dynamics are governed by a stochastic differential equation but remains unknown, with only discrete-time observations available. Existing approaches face fundamental limitations: model-based PDE methods suffer from non-identifiability, while model-free methods based on the classical reinforcement learning framework suffer from large discretization errors by ignoring the continuous-time structure.
We introduce Optimal-PhiBE, an equation that integrates discrete-time information into a continuous-time PDE, combining the strengths of both RL and PDE formulations. In linear-quadratic control, Optimal-PhiBE can even achieve accurate continuous-time optimal policy with only discrete-time information. We further develop model-free algorithms for solving Optimal-PhiBE, requiring only minimal modifications to standard RL methods, and we establish convergence guarantees under model misspecification. Unlike classical RL analyses, whose errors typically blow up as the sampling interval shrinks, the approximation error of PhiBE remains stable and independent of discretization by exploiting the smoothness intrinsic to continuous-time dynamics.
]]> AI-powered denoising for scientific discovery http://www-math.umd.edu/research/seminars.html Mon, 09 Feb 2026 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Carlos Fernandez-Granda (New York University) - https://math.nyu.edu/~cfgranda/
Abstract: Deep neural networks are the state of the art for many signal-processing tasks, including image denoising. However, applying these models to real-world scientific data presents substantial challenges. In this talk, we discuss these challenges and introduce a series of simulation-based, unsupervised, and semi-supervised strategies designed to overcome them. We demonstrate that these approaches perform effectively on real electron microscopy data, revealing previously unobserved atomic-level dynamics in catalytic nanoparticles.
]]> Low-Rank Methods for Interacting Particle Systems and Quantum Superoperator Learning http://www-math.umd.edu/research/seminars.html Mon, 23 Feb 2026 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Quanjun Lang (Duke University) - https://sites.math.duke.edu/~ql157/
Abstract: We introduce a multi-type interacting particle system on graphs to model heterogeneous agent-based dynamics. Within this framework, we develop algorithms that jointly learn the interaction kernels, the latent type assignments, and the underlying graph structure. The approach has three stages: (i) a low-rank matrix sensing step that recovers a shared interaction embedding, (ii) a clustering step that identifies the discrete types, and (iii) a post-processing step to factorize the graph and kernels. Under the assumption of the restricted isometry property (RIP), we obtain theoretical guarantees on sample complexity and convergence for a wide range of model parameters. Building on the same low-rank matrix sensing framework, I will then discuss quantum superoperator learning, encompassing both quantum channels and Lindbladian generators. We propose an efficient randomized measurement design and use accelerated alternating least squares to estimate the low-rank superoperator. The resulting performance guarantees follow from RIP conditions, which are known to hold for Pauli measurement ensembles.
]]> Extending Measure Dynamics Beyond Generative Modeling http://www-math.umd.edu/research/seminars.html Mon, 02 Mar 2026 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Jiequn Han (Flatiron Institute) - https://users.flatironinstitute.org/~jhan/
Abstract: Transport-based models, such as score-based diffusion and flow-matching models, have become a leading paradigm for generative modeling: from a dataset of samples, one learns dynamics that generate new samples from the same distribution. In many scientific settings, however, the objective is not simply to reproduce a training distribution, but to adapt or infer distributions under new constraints or incomplete observations. This motivates extending the transport viewpoint beyond the standard training-and-sampling setting.

I will describe two such directions. First, inference-time adaptation: modifying the inference dynamics induced by a pre-trained model to sample from new target distributions without retraining, while preserving stability and efficiency. Second, distributional inference from limited or noisy data: constructing measure dynamics that recover target distributions when the observation process is available only through a black-box simulator. Together, these examples illustrate how transport-based methods enable flexible control and inversion at the level of probability measures, substantially broadening their role in scientific and engineering applications.
]]> Flow Maps: Generative models with lightning-fast inference http://www-math.umd.edu/research/seminars.html Mon, 23 Mar 2026 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Nick Boffi (Carnegie Mellon University) - https://nmboffi.github.io/
Abstract: Flow-based models have spurred a revolution in generative modeling, driving astounding advancements across diverse domains including high-resolution text to image synthesis and de-novo drug design. Yet despite their remarkable performance, inference in these models requires the solution of a differential equation, which is extremely costly for the large-scale neural network-based models used in practice. In this talk, we introduce a mathematical theory of flow maps, a new class of generative models that directly learn the solution operator for a flow-based model. By learning this operator, flow maps can generate data in 1-4 network evaluations, leading to orders of magnitude faster inference compared to standard flow-based models. We discuss several algorithms for efficiently learning flow maps in practice that emerge from our theory, and we show how many popular recent methods for accelerated inference -- including consistency models, shortcut models, align your flow, and mean flow -- can be viewed as particular cases of our formalism. We demonstrate the practical effectiveness of flow maps across several tasks including image synthesis, geometric data generation, and inference-time guidance of pre-trained text-to-image models.
]]> Data-Driven Methods for Kinetic Systems: Multiscale Modeling and Feedback Control http://www-math.umd.edu/research/seminars.html Mon, 30 Mar 2026 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Jincheng Lu (University of Minnesota) - https://lujingcheng666.wixsite.com/mysite
Abstract: In this talk, I will discuss data-driven methods for kinetic systems, with an emphasis on multiscale modeling and instability control in plasma dynamics. In the first part of the talk, I will introduce our continuous data assimilation algorithm for hydrodynamic moment recovery. Utilizing a relaxation-based nudging system, the method simultaneously reconstructs the solution state and unknown force terms—encoding higher-order moment effects—from sparsely observed data.In the second part, I will discuss dynamical feedback control for the Vlasov–Poisson system. We develop a data-driven control framework using low-rank neural operators, trained via PDE-constrained optimization. The resulting control laws maintain stable performance over a long time horizon. In addition, I will present a cancellation-based control strategy that admits provable infinite-time stabilization.
]]> A mean-field games laboratory for generative artificial intelligence: from foundations to applications in scientific computing http://www-math.umd.edu/research/seminars.html Mon, 06 Apr 2026 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Benjamin Zhang (University of North Carolina at Chapel Hill ) - https://benjzhang.com/
Abstract: We demonstrate the versatility of mean-field games (MFGs) as a mathematical framework for explaining, enhancing, and designing generative models. We establish connections between MFGs and major classes of flow- and diffusion-based generative models by deriving continuous-time normalizing flows and score-based models through different choices of particle dynamics and cost functions. We study the mathematical structure and properties of each generative model by examining their associated MFG optimality conditions, which consist of coupled forward-backward nonlinear partial differential equations (PDEs). We present this framework as an MFG laboratory, a platform for experimentation, invention, and analysis of generative models. Through this laboratory, we show how MFG structure informs new normalizing flows that robustly learn data distributions supported on low-dimensional manifolds. In particular, we show that Wasserstein proximal regularizations inform the well-posedness and robustness of generative flows for singular measures, enabling stable training with less data and without specialized architectures. We then briefly discuss applications of these principled generative models to scientific computing, including simulation-based inference and operator learning.
]]> A Generative AI Approach for Uncertainty Reduction in High-dimensional Complex Systems http://www-math.umd.edu/research/seminars.html Mon, 20 Apr 2026 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Feng Bao (Florida State University) - https://www.math.fsu.edu/~bao/
Abstract: In this talk, we present the diffusion-model-based Ensemble Score Filter (EnSF) for accurate and efficient high-dimensional nonlinear filtering that reduces predictive uncertainty in high-dimensional dynamical systems.
Nonlinear filtering, also known as data assimilation, is the process of estimating the evolving state of a dynamical system by optimally combining noisy observations with predictions from a numerical model. Conventional particle filters and ensemble Kalman filters lose accuracy in highly nonlinear, large-scale settings. EnSF overcomes this by representing the filtering density via a score-based diffusion model in a pseudo-temporal domain, storing information in the score function rather than finite Monte Carlo samples. A training-free, mini-batch Monte Carlo estimator directly approximates the score function at any pseudo-spatial–temporal location, avoiding costly neural network training while retaining high accuracy. Numerical results on Lorenz-96 systems with up to one million dimensions show EnSF’s substantial gains over the state-of-the-art Kalman type Filter. We further demonstrate the method for data assimilation in calibrating benchmark SPDE solutions and atmosphere–ocean simulation models.
]]> Learning network dynamics from data through compressive sensing techniques http://www-math.umd.edu/research/seminars.html Mon, 27 Apr 2026 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Edmilson Roque dos Santos (MPI-Dresden) - https://edmilson-roque-santos.github.io/
Abstract: Networks of coupled dynamical systems are fundamental models across the sciences, from physics to neuroscience. Despite their success, the governing equations of such systems are often unknown, limiting our ability to predict and control their dynamics. In many applications, only time series data from the network is accessible, and learning the governing equations from data becomes an inverse problem. In this talk, inspired by compressive sensing techniques, I will show how learning network dynamics from data can be formulated as a convex optimization problem. By exploiting structural information encoded in the network dynamics, such as sparsity, statistical properties, and symmetries, we characterize the minimum amount of data required for learning the network dynamics exactly (and robustly). We illustrate these ideas using networks of coupled chaotic maps and oscillators.
]]>