Abstract: Conformal prediction is a distribution-free method for uncertainty quantification that ensures finite sample guarantee. However, its validity relies on the assumption of data exchangeability. In this talk, I will introduce several conformal prediction approaches tailored for non-exchangeable data settings, including clustered data with missing responses, nonignorable missing data, and label shift data. To provide an asymptotic conditional coverage guarantee for a given subject, we propose constructing prediction regions by establishing the highest posterior density region of the target. This method is more accurate under complex error distributions, such as asymmetric and multimodal distributions, making it beneficial for personalized and heterogeneous scenarios. I will present some numerical results to illustrate their effectiveness. This is a joint work with Menghan Yi, Yingying Zhang and Yanlin Tang.
Abstract: Approximate message passing (AMP) emerges as an effective iterative algorithm for solving high-dimensional statistical problems. However, prior AMP theory, which focused mostly on high-dimensional asymptotics, fell short of predicting the AMP dynamics when the number of iterations surpasses o(log n / log log n) (with n the problem dimension). To address this inadequacy, this talk introduces a non-asymptotic framework towards understanding AMP. Built upon a new decomposition of AMP updates in conjunction with well-controlled residual terms, we lay out an analysis recipe to characterize the finite-sample convergence of AMP up to O(n / polylog(n)) iterations. We will discuss concrete consequences of the proposed analysis recipe in the Z2 synchronization problem; more specifically, we predict the behavior of randomly initialized AMP for up to O(n/poly(\log n)) iterations, showing that the algorithm succeeds without the need of a careful spectral initialization and also a subsequent refinement stage (as conjectured recently by Celentano et al.)
Abstract: Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance. This is based on joint works with Gareth Roberts (University of Warwick) and Giacomo Zanella (Bocconi University)
Abstract: Time series data arising in many applications nowadays are high-dimensional. A large number of parameters describe features of these time series. Sensible inferences on these parameters with limited data are possible if some underlying lower-dimensional structure is present. We propose a novel approach to modeling a high-dimensional time series through several independent univariate time series, which are then orthogonally rotated and sparsely linearly transformed. With this approach, any specified intrinsic relations among component time series given by a graphical structure can be maintained at all time snapshots. We call the resulting process an Orthogonally-rotated Univariate Time series (OUT). Key structural properties of time series such as stationarity and causality can be easily accommodated in the OUT model. For Bayesian inference, we put suitable prior distributions on the spectral densities of the independent latent times series, the orthogonal rotation matrix, and the common precision matrix of the component times series at every time point. A likelihood is constructed using the Whittle approximation for univariate latent time series. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We study the convergence of the pseudo-posterior distribution based on the Whittle likelihood for the model's parameters upon developing a new general posterior convergence theorem for pseudo-posteriors. We find that the posterior contraction rate for independent observations essentially prevails in the OUT model under very mild conditions on the temporal dependence described in terms of the smoothness of the corresponding spectral densities. In the course of establishing the result, we develop a new general theorem on contraction rate of a pseudo-posterior distribution that is potentially applicable in other situations. Through a simulation study, we compare the accuracy of estimating the parameters and identifying the graphical structure with other approaches. We apply the proposed methodology to analyze a dataset on different industrial components of the US gross domestic product between 2010 and 2019 and predict future observations.
Based on a collaboration with Arkaprava Roy, University of Florida, and Anindya Roy, University of Maryland-Baltimore County.
Abstract: In many applications the population means of geographically adjacent small areas exhibit a spatial variation. If the auxiliary variables employed in a small area model do not adequately account for the spatial pattern of the means, the residual variation will be absorbed into the random effects component of the linking part of the model. As a result, the independent and identical distributional assumptions on random effects of the traditional Fay-Herriot model will fail. Additionally, limited resources often prevent many subpopulations from being sampled, resulting in non-sampled small areas. In fact, sometimes by the design of a survey, the survey provides aggregated statistics for an outcome of interest may be only at a higher level, and it provides no data to construct direct estimates of characteristics for the lower level subpopulations. For example, a domain in a survey may be a group of counties and the survey collects data from that domain to provide a direct estimate for the group. Often useful covariates for individual counties are available from administrative records or other surveys. Covariates data can be integrated with aggregated statistics for the primary outcome through innovative small area estimation methodology to leverage aggregated outcome data to produce better estimates and associated measures of uncertainty for the disaggregated subpopulation means. To produce small area estimates for non-sampled domains or for lower level of geography we generalize the celebrated Fay-Herriot model, which has been extensively used for several decades by many National Statistical Offices around the world, to produce reliable small area statistics. To address our challenge we employ a Bayesian approach based on a number of popular spatial random-effect models. Effectiveness of our Bayesian spatial solution is assessed based on simulated and real data. Specifically, we examine predictions of statewide four-person family median incomes based on the 1990 Current Population Survey and the 1980 Census for the United States of America. Under mild conditions, we establish the propriety of the posterior distributions for various spatial models for a useful class of improper prior densities on model parameters.
Abstract: Variational inference (VI) has emerged as a widely used alternative to Markov Chain Monte Carlo (MCMC) for approximating complex probability densities arising in Bayesian models. While MCMC enjoys well-established theoretical properties, the theoretical foundations of VI remain less explored. In this talk, we present recent advances in the theoretical understanding of VI. Specifically, we demonstrate that point estimators derived from VI can achieve consistency and optimal rates of convergence under a broad set of conditions, applicable to a wide range of problems. Moreover, we prove a Bernstein-von Mises theorem for the mean-field variational approximation, demonstrating that the resulting point estimator not only attains the root-n rate of convergence but also achieves the same optimal estimation efficiency as maximum likelihood estimation. Further, we examine the role of the evidence lower bound (ELBO) in model selection, demonstrating that for regular parametric models, ELBO asymptotically agrees with the widely used Bayesian Information Criterion (BIC); while for finite mixture models, a representative class of singular models, ELBO correctly identifies the number of components where BIC fails. These findings demonstrate that variational inference provides a computationally efficient alternative to conventional approaches for both optimal parameter estimation and consistent model selection.
Abstract: In network settings, interference between units makes causal inference more challenging as outcomes may depend on the treatments received by others in the network. Typical estimands in network settings focus on treatment effects aggregated across individuals in the population. We propose a framework for estimating node-wise counterfactual means, allowing for more granular insights into the impact of network structure on treatment effect heterogeneity. We develop a doubly robust and non-parametric estimation procedure, KECENI (Kernel Estimation of Causal Effect under Network Interference), which offers consistency and asymptotic normality under network dependence. The utility of this method is demonstrated through an application to microfinance data, revealing the impact of network characteristics on treatment effects.
Abstract: From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to insure tight rates of convergence is relatively under-explored for many settings.
In this talk, we study the problem of direct effect estimation under interference. Here, the interference between experimental subjects is captured by a network and the experimenter seeks to estimate the direct effect, which is the difference between the outcomes when (i) a unit is treated and its neighbors receive control and (ii) the unit and its neighbors receive control. We present a new experimental design under which the normalized variance of a Horvitz—Thompson style estimator is bounded as $n * Var
Abstract: Recently, there has been a surge of interest in hypothesis testing methods for combining dependent studies without explicitly assessing their dependence. Among these, the Cauchy combination test (CCT) stands out for its approximate validity and power, leveraging a heavy-tail approximation insensitive to dependence. However, CCT is highly sensitive to large p-values and inverting it to construct confidence regions can result in regions lacking compactness, convexity, or connectivity. In this talk, we will propose a "heavily right" strategy by excluding the left half of the Cauchy distribution in the combination rule, retaining CCT's resilience to dependence while resolving its sensitivity to large p-values.
Abstract: Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications, for example, neuroimaging, to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a given arbitrary subset of connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. Our ability to conduct mesoscale testing on a meaningful group of edges is particularly relevant for applications such as neuroimaging and distinguishes our approach from prior work, which tends to focus either on a single node or the whole network. Our approach can leverage all available network information, and learn informative projections which improve testing power when low-dimensional latent network structure is present.
Abstract: Data-driven personalized decision-making has become increasingly important in many scientific fields. Most existing methods rely on the assumption of no unmeasured confounding to establish causal inferences before proceeding with decision-making for identifying the optimal individualized treatment rule (ITR). However, this assumption is often violated in practice, especially in observational studies. While techniques like instrumental variables or proxy variables can help address unmeasured confounding, such additional data sources are not always available. Moreover, robustly learning the optimal ITR from observational data is challenging when data are unbalanced, where certain combinations of treatments and patient characteristics are underrepresented. In this paper, we develop a novel Bayesian approach to robustly learn the optimal ITR for continuous treatments under unmeasured confounding. For causal identification, we propose a Bayesian causal model that achieves unique identification under certain mild distributional assumptions, without requiring additional data sources. For policy optimization, we develop a practical algorithm that robustly learns the optimal ITR by identifying a conservative policy. Through simulations and an application to a large- scale kidney transplantation dataset, we demonstrate the proposed method’s identifi- ability, utility, and robustness, highlighting its value in advancing precision medicine.
Abstract: To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021. If time permits, I will discuss causal learning in an interconnected and interdependent world.
Abstract: Graphical models have become an important tool for summarizing conditional relations in a multivariate time series. Typically, the partial covariance is used as a measure of conditional dependence and forms the basis for construction of the interaction graph. However, for many real time series the outcomes may not be Gaussian and/or could be a mixture of different outcomes. For such time series using the partial covariance as a measure of conditional dependence may lead to misleading results. The aim of this talk is to develop graphical models for non-Gaussian time series. We propose a broad class of time series models which are specifically designed to succinctly encode the graphical model in its coefficients. For each univariate component in the time series, we model its conditional distribution with a distribution from the exponential family. We derive conditions under which the conditional specification leads to a well-defined strictly stationary time series. Further, we show that the time series is geometrically mixing and obtain an approximate Gibbs sampler to simulate sample paths.
(Based on joint work with Dr. Suhasini Subba Rao, Texas A&M University)
Abstract: Suppose that particles are randomly distributed in $R^d$, and that they are subject to identical stochastic motion independently of each other. The Smoluchowski process describes fluctuations of the number of particles in an observation region over time. The goal is to estimate characteristics of the particle displacement process from the count data. Such estimation problems arise in various application areas, e.g., in biology (studies of particles/cells motility), trasportation science (traffic estimation), etc.
We discuss probabilistic properties of the Smoluchowski processes and consider related statistical problems for two different models of the particle displacement process: the undeviated uniform motion (when a particle moves with random constant velocity along a straight line) and the Brownian motion displacement. In these settings we develop estimators with provable accuracy guarantees.
Abstract: Stochastic approximation (SA) is a powerful and scalable computational method for iteratively estimating the solution of optimization problems in the presence of randomness, particularly well-suited for large-scale and streaming data settings. In this talk, we propose a theoretical framework for stochastic approximation (SA) applied to non-parametric least squares in reproducing kernel Hilbert spaces (RKHS), enabling online statistical inference in non-parametric regression models. Our approach combines an online multiplier bootstrap with functional stochastic gradient descent (SGD) in RKHS to achieve two key inferential advances: (1) scalable online confidence intervals/bands—constructing asymptotically valid pointwise confidence intervals for local inference and simultaneous confidence bands for global inference of nonlinear regression functions; and (2) minimax online hypothesis testing—building optimal Wald-type test statistics for nonparametric regression models. The main theoretical contributions consist of a unified framework for characterizing the non-asymptotic behavior of the functional SGD estimator and demonstrating the consistency of the multiplier bootstrap method. And the theory specifically establishes the interplay between the online learning rate and the minimax estimation/power performance in uncertainty quantification.
Abstract: The Wasserstein barycenter plays a fundamental role in averaging measure-valued data under the framework of optimal transport (OT). However, there are tremendous challenges in computing and estimating the Wasserstein barycenter for high-dimensional distributions. In this talk, we will discuss some recent progress in advancing the statistical and computational frontiers of optimal transport barycenters. We first introduce a multimarginal Schrödinger barycenter (MSB) based on the entropy regularized multimarginal optimal transport problem that admits general-purpose fast algorithms for computation. By recognizing a proper dual geometry, we derive sharp non-asymptotic rates of convergence for estimating several key MSB quantities (cost functional, Schrödinger coupling and barycenter) from point clouds randomly sampled from the input marginal distributions. We will also consider the computation exact (i.e., unregularized) Wasserstein barycenter, which can be recast into a nonconvex-concave minimax optimization. By alternating between the primal Wasserstein and dual potential Sobolev optimization geometries, we introduce a linear-time and linear-space Wasserstein-Descent H-Ascent (WDHA) algorithm and prove its algorithmic convergence to a stationary point.
Abstract: In this talk, we shall consider a variety of multiplex network models where the layers of the network have the same collection of nodes. Specifically, we shall be interested in the case where layers of the network can be partitioned into groups with similar organization. The goals include clustering of the layers and subsequent inference in each group of layers. We discuss existing current results and open questions.
Abstract: State of the art machine learning algorithms implicitly learn useful structure from raw, unstructured data. How this is possible is somewhat mysterious, given that most unstructured problems are ill-posed and lack the typical identifying restrictions necessary to consistently recover low-dimensional structure. Moreover, when recovery is possible, standard nonparametric theory demands astronomical sample sizes due to the curse of dimensionality. So how is it that these algorithms are capable of learning at all?
To shed light on this intriguing question, we will discuss our recent progress towards understanding structure learning in nonparametric models. We will show how classical algorithms can provably recover latent graphical structure in general families under smoothness (i.e. H\"older) conditions, and then extend this to even broader classes of models. Along the way, we will also discuss how it is possible to circumvent the curse of dimensionality in structured models, even if this structure may not be known in advance. As a special case, our results provide a complete resolution to the problem of nonparametric estimation of high-dimensional graphical models.
Abstract: Generalized linear mixed models (GLMM) with crossed random effects are well known not only for the computational challenges involved in numerically evaluating the maximum likelihood estimator (MLE) but also for the theoretical challenges in studying asymptotic behavior of the MLE under these models. In fact, not until 2012 has consistency of the MLE been established for GLMM with crossed random effects (Jiang 2013). Now, another part of the asymptotic behavior, that is, asymptotic normality of the MLE for GLMM with crossed random effects has also been established (Jiang 2025). This talk provides an overview of this “amazing journey”, focusing on the methodology developments for overcoming the theoretical challenges.
References: Jiang, J. (2013), The subset argument and consistency of MLE in GLMM: Answer to an open problem and beyond, Ann. Statist. 41, 177-195. Jiang, J. (2025), Asymptotic distribution of maximum likelihood estimator in generalized linear mixed models with crossed random effects, Ann. Statist., in press.
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827