Mathematical Data Science

Mathematical Data Sciencehttp://www-math.umd.edu/research/seminars.html Towards Information Geometric Mechanics http://www-math.umd.edu/research/seminars.html Mon, 16 Sep 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Florian Schaefer (Georgia Tech) - https://f-t-s.github.io/
Abstract: The work presented in this talk uses ideas from semidefinite programming and information geometry to efficiently simulate gas dynamics in the presence of shock waves. The latter cause severe numerical challenges for classical and learning-based solvers.The talk begins by observing that shock formation arises from the deformation map reaching the boundary of the manifold of diffeomorphisms. This motivates using the log-determinant barrier function of semidefinite programming to modify the geometry of the manifold such that the deformation map approaches but never reaches its boundary. This information geometric regularization (IGR) preserves the original long-time behavior without forming singular shocks, greatly simplifying numerical simulation. The modified geometry on the diffeomorphism manifold is also the information geometry of the mass density. I will show how this observation motivates information geometric mechanics that views the solutions of continuum mechanical equations as parameters of probability distributions to be evolved on a suitable information geometry, promising far-reaching extensions of IGR.
]]> Continuum Attention for Neural Operators http://www-math.umd.edu/research/seminars.html Mon, 23 Sep 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Matthew Levine (Broad Institute of MIT/Harvard) - https://mattlevine.netlify.app/
Abstract: Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.
]]> Graph Expansions of Deep Neural Networks and their Universal Scaling Limits http://www-math.umd.edu/research/seminars.html Mon, 30 Sep 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Cristopher Salvi (Imperial College London) - https://profiles.imperial.ac.uk/c.salvi
Abstract: I will present a novel approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with an expansion of neural networks reminiscent of Butcher series for ODEs, where the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick's principle to compute the expectation of each of its terms. The leading contribution to each term can be determined by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. By developing a calculus bridging analytic and graphical operations, I will explain how to obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease.

]]> Advancing Differential Equation Solver Through Deep Learning http://www-math.umd.edu/research/seminars.html Mon, 14 Oct 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Senwei Liang (Lawrence Berkeley National Laboratory) - https://leungsamwai.github.io/
Abstract: In recent years, deep learning algorithms have revolutionized fields such as time-series prediction and materials discovery. These advancements are now inspiring innovative methods to address computational bottlenecks in traditional differential equation (DE) solvers for complex, high-dimensional problems. In this talk, I will present my research at the intersection of deep learning and differential equation solvers aimed at enhancing computational efficiency.

I will begin by showing how recurrent neural networks, which are powerful tools for modeling dynamical systems, can effectively learn missing dynamics. While neural networks can approximate solutions for many tasks, their optimization often faces implicit biases that make achieving high accuracy challenging. Next, I will introduce the finite expression method, a symbolic approach that leverages reinforcement learning to find accurate and interpretable mathematical solutions to DEs. In addition, data efficiency is critical for optimizing neural networks, particularly when studying rare events where data near the transition state is typically scarce. Finally, I will explore how reinforcement learning can identify regions of interest and characterize them through data. This data can then be utilized in neural network-based PDE solvers to approximate the committor function, a crucial element in the study of rare events.
]]> Data-sparse and unsupervised linear and nonlinear model order reduction via GPT-PINN and its derivatives http://www-math.umd.edu/research/seminars.html Mon, 21 Oct 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yanlai Chen ( University of Massachusetts, Dartmouth) - https://yanlaichen.reawritingmath.com/
Abstract: The Physics-Informed Neural Network (PINN) has emerged as a powerful tool for obtaining numerical solutions to nonlinear partial differential equations (PDEs). However, it often suffers from excessive parameterization and can be time-consuming to train, particularly in multi-query and real-time simulation scenarios.In this talk, we introduce the Generative Pre-Trained PINN (GPT-PINN) and its variants, designed to address these challenges in a data-sparse manner. GPT-PINN and its derivatives represent a novel unsupervised meta-learning paradigm for parametric systems. As a network of networks, the outer or meta-network is hyper-reduced, featuring only one hidden layer with a significantly reduced number of neurons. Each hidden neuron's activation function is a fully pre-trained PINN based on a carefully selected system configuration. This meta-network adaptively learns the parametric dependencies of the system, incrementally expanding its hidden layer by adding one neuron at a time. We will highlight one variant TGTP-PINN's capabilities in efficiently capturing moving shocks.
]]> Towards Effective and Robust Scientific Machine Learning through Mathematical Approaches http://www-math.umd.edu/research/seminars.html Mon, 28 Oct 2024 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yeonjong Shin (North Carolina State University) - https://sites.google.com/site/shinmathematics/
Abstract: Machine learning (ML) has achieved unprecedented empirical success in diverse applications. It now has been applied to solve scientific and engineering problems, which has become an emerging field, Scientific Machine Learning (SciML). Many ML techniques, however, are very complex and sophisticated, commonly requiring many trial-and-error and tricks. These result in a lack of robustness and interpretability, which are critical factors for scientific applications. This talk centers around mathematical approaches for SciML, promoting trustworthiness.The first part will present recent efforts advancing the predictive power of physics-informed machine learning through robust training methods. This includes an effective training method for multivariate neural networks, namely, Active Neuron Least Squares (ANLS) and a two-step training method for deep operator networks. The second part is about how to embed the first principles of physics into neural networks. I will present a general framework for designing NNs that obey the first and second laws of thermodynamics. The framework not only provides flexible ways of leveraging available physics information but also results in expressive NN architectures. I will also present an intriguing phenomenon of this framework when it is applied in the context of latent space dynamics identification where an intriguing correlation is observed between an entropy related quantity in the latent space and the behaviors of the full-state solution.
]]> TBA http://www-math.umd.edu/research/seminars.html Mon, 04 Nov 2024 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Sinho Chewi (Yale University) - https://chewisinho.github.io/
Abstract: TBA
]]> Wasserstein Proximal Operators Describe Score-Based Generative Models and Resolve Memorization http://www-math.umd.edu/research/seminars.html Mon, 11 Nov 2024 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Siting Liu (University of California, Riverside) - https://sites.google.com/view/sitingl
Abstract: We focus on the fundamental mathematical structure of score-based generative models (SGMs). We formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form of a pair of coupled PDEs: a forward-controlled Fokker-Planck (FP) equation, and a backward Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation. Next, with the mathematical structure at hand, we present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs in terms of training samples and training time. The WPO-informed kernel model is explicitly constructed to avoid the recently studied memorization effects of score-based generative models. The mathematical form of the new kernel-based models in combination with the use of the terminal condition of the MFG reveals new explanations for the manifold learning and generalization properties of SGMs, and provides a resolution to their memorization effects. Our mathematically informed kernel-based model suggests new scalable bespoke neural network architectures for high-dimensional applications. This is a joint work with Benjamin J. Zhang, Markos A. Katsoulakis, Wuchen Li and Stanley J. Osher.
]]> Continous-Time Deep Learning: Algorithms and Applications http://www-math.umd.edu/research/seminars.html Mon, 18 Nov 2024 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Lars Ruthotto (Emory University) - https://www.math.emory.edu/~lruthot/
Abstract: In this talk, we survey recent continuous-time deep learning approaches, present applications in high-dimensional mean field games and optimal control, and discuss efficient training and quantization techniques. We demonstrate how continuous-time deep learning models can address high-dimensional mean field games and optimal control problems, extending their application to state spaces with dimensions reaching the hundreds. For these applications, we leverage neural ordinary differential equations (neural ODEs) in combination with scalable Lagrangian PDE solvers to mitigate the curse of dimensionality, optimizing a neural network representation of the value function with penalty terms that enforce Hamilton-Jacobi-Bellman (HJB) equations—eliminating the need for pre-computed training data. We conclude with ongoing research on improving deep neural network training efficiency through mixed-precision computation and modified Gauss-Newton algorithms. We reduce model size and computational cost by dynamically adjusting the floating point precision and leveraging advanced automatic differentiation.
]]> TBA http://www-math.umd.edu/research/seminars.html Mon, 02 Dec 2024 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Jonathan Niles-Weed (New York University) - https://www.jonathannilesweed.com/
Abstract: TBA
]]> Canceled http://www-math.umd.edu/research/seminars.html Mon, 09 Dec 2024 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Canceled
]]> Statistics and Riemannian Structure of Gromov-Wasserstein Distance http://www-math.umd.edu/research/seminars.html Mon, 03 Feb 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Zhengxin Zhang (Cornell University) - https://zhengxinzh.github.io/
Abstract: The Gromov-Wasserstein (GW) distance, rooted in optimal transport (OT) theory, quantifies dissimilarity between metric measure spaces and provides a natural framework for aligning them. As such, GW distance enables applications including object matching, single-cell genomics, and matching language models. While computational aspects of the GW distance have been studied heuristically, most of the mathematical theories pertaining to GW duality, Brenier maps, geometry, etc., remained elusive, despite the rapid progress these aspects have seen under the classical OT paradigm in recent decades. This talk will cover recent progress on closing these gaps for the GW. We present (i) sharp statistical estimation rates through duality (ii) a thorough investigation of the Jordan-Kinderlehrer-Otto (JKO) scheme for the gradient flow of inner product GW (IGW) distance, and (iii) a dynamical formulation of IGW, which generalizes the Benamou-Brenier formula for the Wasserstein distance. Central to (ii), (iii) is a Riemannian structure on the space of probability distributions, based on which we also propose novel numerical schemes for measure evolution and deformation.

]]> Kernel methods for equation and operator learning with scarce data http://www-math.umd.edu/research/seminars.html Mon, 10 Feb 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Bamdad Hosseini (University of Washington) - https://amath.washington.edu/people/bamdad-hosseini
Abstract: Operator learning is the task of approximating mappings between Banach spaces towards emulation of expensive computer models or simulation of complex physical systems. In this talk I will discuss a general mathematical framework for this task based on simple kernel regression techniques and show numerical benchmarks that highlight its effectiveness and accuracy in comparison to neural net techniques. Afterwards I will talk about a different approach to operator learning via equation discovery that enables the simulation of physical processes in very scarce data regimes.
]]> Low-dimensional adaptation of diffusion models http://www-math.umd.edu/research/seminars.html Mon, 17 Feb 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yuxin Chen (University of Pennsylvania) - https://yuxinchen2020.github.io/
Abstract: This talk is concerned with how diffusion generative models leverage (unknown) low-dimensional structure to accelerate sampling. Focusing on two mainstream samplers --- the denoising diffusion implicit model (DDIM) and the denoising diffusion probabilistic model (DDPM) --- and assuming accurate score estimates, we prove that their iteration complexities scale linearly in some intrinsic dimension of the target distribution. Our results apply to a broad family of target distributions without requiring smoothness or log-concavity assumptions. Our findings provide the first rigorous evidence for the low-dimensional adaptation ability of the DDIM-type samplers, and significantly improves over the state-of-the-art DDPM theory regarding total variation convergence.

]]> Function-Space Models for Deep Learning http://www-math.umd.edu/research/seminars.html Mon, 24 Feb 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Rahul Parhi (University of California, San Diego) - https://sparsity.ucsd.edu/rahul/
Abstract: Deep learning has been wildly successful in practice and most state-of-the-art artificial intelligence systems are based on neural networks. Lacking, however, is a rigorous mathematical theory that adequately explains the amazing performance of deep neural networks. In this talk, I present a new mathematical framework that provides the beginning of a deeper understanding of deep learning. This framework precisely characterizes the functional properties of trained neural networks. The key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory. This framework explains the effect of weight decay regularization in neural network training, the importance of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems. At the end of the talk we shall conclude with a number of open problems and interesting research directions.
]]> Zero-shot forecasting without a model: When can LLMs do better than a stochastic parrot? http://www-math.umd.edu/research/seminars.html Mon, 03 Mar 2025 14:30:00 EST Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yuanzhao Zhang (Sante Fe Institute) - https://y-zhang.com/
Abstract: How much can we learn from a time series when models are unavailable and data is limited? Surprisingly, quite a lot. In this talk, I will explore the use of large language models (LLMs) for forecasting chaotic systems. Unlike the traditional approach of training a model specifically on the system to be predicted, LLMs demonstrate the remarkable ability to generate zero-shot forecasts for entirely new systems without requiring retraining or fine-tuning. I will discuss the mechanisms utilized by LLMs for such tasks and compare them with classical strategies from nonlinear dynamics.
]]> Operator Learning Neural Scaling and Distributed Applications http://www-math.umd.edu/research/seminars.html Mon, 24 Mar 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Zecheng Zhang (Florida State University) - https://www.math.fsu.edu/~zhang/
Abstract: In this talk, we will focus on operator learning — a framework for approximating mappings between function spaces that has broad applications in PDE-related problems. We will begin by discussing the mathematical foundations of operator approximation, which inform the design of neural network architectures and provide a basis for analyzing the performance of trained models on test samples. Specifically, we will introduce the neural scaling law, which characterizes error convergence in relation to network size and generalization error in relation to training dataset size. Building on these theoretical insights, we will present a distributed learning algorithm based on the theoretical architectures to address two key computational challenges: (1) efficiently handling heterogeneous problems where input functions exhibit vastly different properties, and (2) multi-operator learning, where use one network model to approximate multiple operators simultaneously such that the model can be extrapolated and rapidly adapt to the new problems.
]]> Two-Stage Stochastic Programs with Polynomial Loss Function http://www-math.umd.edu/research/seminars.html Mon, 31 Mar 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Suhan Zhong (Texas A&M University) - https://people.tamu.edu/~suzhong/
Abstract: Two-stage stochastic programs (SPs) with polynomial loss functions serve as a powerful framework for modeling decision-making problems under uncertainty. In this talk, we introduce a two-phase approach to find global optimal solutions for two-stage SPs with continuous decision variables and nonconvex recourse functions. Our method does not only generate global lower bounds for the nonconvex stochastic program, but also yields an explicit polynomial approximation for the recourse function. It is particularly suitable for the case where the random vector follows a continuous distribution or when dealing with a large number of scenarios.
]]> High dimensional analysis and design of sampling with applications in scientific computing http://www-math.umd.edu/research/seminars.html Mon, 07 Apr 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Yifan Chen (Courant Institute, New York University) - https://yifanc96.github.io/
Abstract: Sampling from probability distributions is a fundamental challenge in physics and data science. State-of-the-art methods often rely on building efficient dynamics for probability measures. This talk examines the analysis and design of such dynamics, drawing insights from and targeting applications in high-dimensional scientific computing. In the first part, we uncover and analyze a novel 'delocalization of bias' phenomenon in MCMC with Langevin dynamics. While sampling bias increases with dimensionality in full coordinates, individual coordinates can exhibit nearly dimension-independent behavior. This finding suggests that the curse of dimensionality in sampling may be mitigated at the level of low-dimensional marginals. In the second part, we propose a generative diffusion dynamics design for probabilistic forecasting, focusing on benchmark applications in stochastically forced Navier-Stokes equations. We prove that a specific design of diffusion coefficients minimizes novel statistical errors at the level of path measures and yields Föllmer processes, which also offer a Bayesian interpretation of the optimal design. We conclude with a real-data scientific application in black hole imaging, where we combine generative diffusion dynamics with MCMC for rigorous posterior sampling. Overall, the talk demonstrates how mathematical understanding and methodological design of high-dimensional sampling dynamics can synergize with insights and applications in scientific computing.
]]> Scientific Machine Learning: Applications to PDEs and Convergence Analysis http://www-math.umd.edu/research/seminars.html Mon, 14 Apr 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Justin Sirignano (University of Oxford) - https://www.maths.ox.ac.uk/people/justin.sirignano
Abstract: Applications of deep learning to partial differential equations (PDEs) will be presented along with recent mathematical results. Physics-informed neural networks (PINNs) and Deep Galerkin Methods (DGM) directly solve PDEs with neural networks. For linear elliptic PDEs, we prove that DGM/PINNs -- despite the non-convexity of neural networks -- trained with gradient descent globally converge to the PDE solution. In the second half of the presentation, we discuss using deep learning to model unknown terms within a PDE. The neural network terms in the PDE are optimized using adjoint PDEs. Numerical examples from fluid dynamics and convergence results will be discussed.
]]> Large Deviation Theory-Informed Importance Sampling for Rare Event Estimation and Control http://www-math.umd.edu/research/seminars.html Mon, 21 Apr 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Shanyin Tong (Columbia University) - https://www.columbia.edu/~st3503/
Abstract: Rare and extreme events like hurricanes, energy grid blackouts, dam breaks, earthquakes, and pandemics are infrequent but have severe consequences. Because estimating the probability of such events can inform strategies that mitigate their effects, scientists must develop methods to study the distribution tail of these occurrences. However, calculating small probabilities is hard, particularly when involving complex dynamics and high-dimensional random variables. In this talk, I will discuss our proposed method for the accurate estimation of rare event or failure probabilities for expensive-to-evaluate numerical models in high dimensions, and its application to rare event control. The proposed approach combines ideas from large deviation theory and adaptive importance sampling. The importance sampler uses a cross-entropy method to find an optimal Gaussian biasing distribution, and reuses all samples made throughout the process for both, the target probability estimation and for updating the biasing distributions. Large deviation theory is used to find a good initial biasing distribution through the solution of an optimization problem. Additionally, it is used to identify a low-dimensional subspace that is most informative of the rare event probability. We compare the method with a state-of-the-art cross-entropy-based importance sampling scheme using examples including a tsunami problem.
]]> Ensemble Kalman Methods and Structured Operator Estimation http://www-math.umd.edu/research/seminars.html Mon, 28 Apr 2025 14:30:00 EDT Where: Zoom Meeting ID： 927 8056 1489 Password：0900 Link: https://go.umd.edu/MTHDataScience
Speaker: Daniel Sanz-Alonso (University of Chicago) - https://sites.google.com/a/uchicago.edu/sanz-alonso/
Abstract: Data assimilation is concerned with estimating the state of a dynamical system from partial observations. In applications such as numerical weather prediction where the state is high dimensional and the dynamics are expensive to simulate, ensemble Kalman filters are often the method of choice. In this talk, I will present new results on structured covariance operator estimation that help explain why these algorithms can be effective even when deployed with a small ensemble size. Our theory also explains the importance of using covariance localization in ensemble Kalman methods for global data assimilation.
]]>