Abstract: Conformal prediction is a distribution-free method for uncertainty quantification that ensures finite sample guarantee. However, its validity relies on the assumption of data exchangeability. In this talk, I will introduce several conformal prediction approaches tailored for non-exchangeable data settings, including clustered data with missing responses, nonignorable missing data, and label shift data. To provide an asymptotic conditional coverage guarantee for a given subject, we propose constructing prediction regions by establishing the highest posterior density region of the target. This method is more accurate under complex error distributions, such as asymmetric and multimodal distributions, making it beneficial for personalized and heterogeneous scenarios. I will present some numerical results to illustrate their effectiveness. This is a joint work with Menghan Yi, Yingying Zhang and Yanlin Tang.
Abstract: Approximate message passing (AMP) emerges as an effective iterative algorithm for solving high-dimensional statistical problems. However, prior AMP theory, which focused mostly on high-dimensional asymptotics, fell short of predicting the AMP dynamics when the number of iterations surpasses o(log n / log log n) (with n the problem dimension). To address this inadequacy, this talk introduces a non-asymptotic framework towards understanding AMP. Built upon a new decomposition of AMP updates in conjunction with well-controlled residual terms, we lay out an analysis recipe to characterize the finite-sample convergence of AMP up to O(n / polylog(n)) iterations. We will discuss concrete consequences of the proposed analysis recipe in the Z2 synchronization problem; more specifically, we predict the behavior of randomly initialized AMP for up to O(n/poly(\log n)) iterations, showing that the algorithm succeeds without the need of a careful spectral initialization and also a subsequent refinement stage (as conjectured recently by Celentano et al.)
Abstract: Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance. This is based on joint works with Gareth Roberts (University of Warwick) and Giacomo Zanella (Bocconi University)
Abstract: Time series data arising in many applications nowadays are high-dimensional. A large number of parameters describe features of these time series. Sensible inferences on these parameters with limited data are possible if some underlying lower-dimensional structure is present. We propose a novel approach to modeling a high-dimensional time series through several independent univariate time series, which are then orthogonally rotated and sparsely linearly transformed. With this approach, any specified intrinsic relations among component time series given by a graphical structure can be maintained at all time snapshots. We call the resulting process an Orthogonally-rotated Univariate Time series (OUT). Key structural properties of time series such as stationarity and causality can be easily accommodated in the OUT model. For Bayesian inference, we put suitable prior distributions on the spectral densities of the independent latent times series, the orthogonal rotation matrix, and the common precision matrix of the component times series at every time point. A likelihood is constructed using the Whittle approximation for univariate latent time series. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We study the convergence of the pseudo-posterior distribution based on the Whittle likelihood for the model's parameters upon developing a new general posterior convergence theorem for pseudo-posteriors. We find that the posterior contraction rate for independent observations essentially prevails in the OUT model under very mild conditions on the temporal dependence described in terms of the smoothness of the corresponding spectral densities. In the course of establishing the result, we develop a new general theorem on contraction rate of a pseudo-posterior distribution that is potentially applicable in other situations. Through a simulation study, we compare the accuracy of estimating the parameters and identifying the graphical structure with other approaches. We apply the proposed methodology to analyze a dataset on different industrial components of the US gross domestic product between 2010 and 2019 and predict future observations.
Based on a collaboration with Arkaprava Roy, University of Florida, and Anindya Roy, University of Maryland-Baltimore County.
Abstract: In many applications the population means of geographically adjacent small areas exhibit a spatial variation. If the auxiliary variables employed in a small area model do not adequately account for the spatial pattern of the means, the residual variation will be absorbed into the random effects component of the linking part of the model. As a result, the independent and identical distributional assumptions on random effects of the traditional Fay-Herriot model will fail. Additionally, limited resources often prevent many subpopulations from being sampled, resulting in non-sampled small areas. In fact, sometimes by the design of a survey, the survey provides aggregated statistics for an outcome of interest may be only at a higher level, and it provides no data to construct direct estimates of characteristics for the lower level subpopulations. For example, a domain in a survey may be a group of counties and the survey collects data from that domain to provide a direct estimate for the group. Often useful covariates for individual counties are available from administrative records or other surveys. Covariates data can be integrated with aggregated statistics for the primary outcome through innovative small area estimation methodology to leverage aggregated outcome data to produce better estimates and associated measures of uncertainty for the disaggregated subpopulation means. To produce small area estimates for non-sampled domains or for lower level of geography we generalize the celebrated Fay-Herriot model, which has been extensively used for several decades by many National Statistical Offices around the world, to produce reliable small area statistics. To address our challenge we employ a Bayesian approach based on a number of popular spatial random-effect models. Effectiveness of our Bayesian spatial solution is assessed based on simulated and real data. Specifically, we examine predictions of statewide four-person family median incomes based on the 1990 Current Population Survey and the 1980 Census for the United States of America. Under mild conditions, we establish the propriety of the posterior distributions for various spatial models for a useful class of improper prior densities on model parameters.
Abstract: Variational inference (VI) has emerged as a widely used alternative to Markov Chain Monte Carlo (MCMC) for approximating complex probability densities arising in Bayesian models. While MCMC enjoys well-established theoretical properties, the theoretical foundations of VI remain less explored. In this talk, we present recent advances in the theoretical understanding of VI. Specifically, we demonstrate that point estimators derived from VI can achieve consistency and optimal rates of convergence under a broad set of conditions, applicable to a wide range of problems. Moreover, we prove a Bernstein-von Mises theorem for the mean-field variational approximation, demonstrating that the resulting point estimator not only attains the root-n rate of convergence but also achieves the same optimal estimation efficiency as maximum likelihood estimation. Further, we examine the role of the evidence lower bound (ELBO) in model selection, demonstrating that for regular parametric models, ELBO asymptotically agrees with the widely used Bayesian Information Criterion (BIC); while for finite mixture models, a representative class of singular models, ELBO correctly identifies the number of components where BIC fails. These findings demonstrate that variational inference provides a computationally efficient alternative to conventional approaches for both optimal parameter estimation and consistent model selection.
Abstract: In network settings, interference between units makes causal inference more challenging as outcomes may depend on the treatments received by others in the network. Typical estimands in network settings focus on treatment effects aggregated across individuals in the population. We propose a framework for estimating node-wise counterfactual means, allowing for more granular insights into the impact of network structure on treatment effect heterogeneity. We develop a doubly robust and non-parametric estimation procedure, KECENI (Kernel Estimation of Causal Effect under Network Interference), which offers consistency and asymptotic normality under network dependence. The utility of this method is demonstrated through an application to microfinance data, revealing the impact of network characteristics on treatment effects.
Abstract: From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to insure tight rates of convergence is relatively under-explored for many settings.
In this talk, we study the problem of direct effect estimation under interference. Here, the interference between experimental subjects is captured by a network and the experimenter seeks to estimate the direct effect, which is the difference between the outcomes when (i) a unit is treated and its neighbors receive control and (ii) the unit and its neighbors receive control. We present a new experimental design under which the normalized variance of a Horvitz—Thompson style estimator is bounded as $n * Var
Abstract: Recently, there has been a surge of interest in hypothesis testing methods for combining dependent studies without explicitly assessing their dependence. Among these, the Cauchy combination test (CCT) stands out for its approximate validity and power, leveraging a heavy-tail approximation insensitive to dependence. However, CCT is highly sensitive to large p-values and inverting it to construct confidence regions can result in regions lacking compactness, convexity, or connectivity. In this talk, we will propose a "heavily right" strategy by excluding the left half of the Cauchy distribution in the combination rule, retaining CCT's resilience to dependence while resolving its sensitivity to large p-values.
Abstract: Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications, for example, neuroimaging, to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a given arbitrary subset of connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. Our ability to conduct mesoscale testing on a meaningful group of edges is particularly relevant for applications such as neuroimaging and distinguishes our approach from prior work, which tends to focus either on a single node or the whole network. Our approach can leverage all available network information, and learn informative projections which improve testing power when low-dimensional latent network structure is present.
Abstract: Data-driven personalized decision-making has become increasingly important in many scientific fields. Most existing methods rely on the assumption of no unmeasured confounding to establish causal inferences before proceeding with decision-making for identifying the optimal individualized treatment rule (ITR). However, this assumption is often violated in practice, especially in observational studies. While techniques like instrumental variables or proxy variables can help address unmeasured confounding, such additional data sources are not always available. Moreover, robustly learning the optimal ITR from observational data is challenging when data are unbalanced, where certain combinations of treatments and patient characteristics are underrepresented. In this paper, we develop a novel Bayesian approach to robustly learn the optimal ITR for continuous treatments under unmeasured confounding. For causal identification, we propose a Bayesian causal model that achieves unique identification under certain mild distributional assumptions, without requiring additional data sources. For policy optimization, we develop a practical algorithm that robustly learns the optimal ITR by identifying a conservative policy. Through simulations and an application to a large- scale kidney transplantation dataset, we demonstrate the proposed method’s identifi- ability, utility, and robustness, highlighting its value in advancing precision medicine.
Abstract: To understand how the interconnected and interdependent world of the twenty-first century operates and make model-based predictions, joint probability models for networks and interdependent outcomes are needed. We propose a comprehensive regression framework for networks and interdependent outcomes with multiple advantages, including interpretability, scalability, and provable theoretical guarantees. The regression framework can be used for studying relationships among attributes of connected units and captures complex dependencies among connections and attributes, while retaining the virtues of linear regression, logistic regression, and other regression models by being interpretable and widely applicable. On the computational side, we show that the regression framework is amenable to scalable statistical computing based on convex optimization of pseudo-likelihoods using minorization-maximization methods. On the theoretical side, we establish convergence rates for pseudo-likelihood estimators based on a single observation of dependent connections and attributes. We demonstrate the regression framework using simulations and an application to hate speech on the social media platform X in the six months preceding the insurrection at the U.S. Capitol on January 6, 2021. If time permits, I will discuss causal learning in an interconnected and interdependent world.
Abstract: Generalized linear mixed models (GLMM) with crossed random effects are well known not only for the computational challenges involved in numerically evaluating the maximum likelihood estimator (MLE) but also for the theoretical challenges in studying asymptotic behavior of the MLE under these models. In fact, not until 2012 has consistency of the MLE been established for GLMM with crossed random effects (Jiang 2013). Now, another part of the asymptotic behavior, that is, asymptotic normality of the MLE for GLMM with crossed random effects has also been established (Jiang 2025). This talk provides an overview of this “amazing journey”, focusing on the methodology developments for overcoming the theoretical challenges.
References: Jiang, J. (2013), The subset argument and consistency of MLE in GLMM: Answer to an open problem and beyond, Ann. Statist. 41, 177-195. Jiang, J. (2025), Asymptotic distribution of maximum likelihood estimator in generalized linear mixed models with crossed random effects, Ann. Statist., in press.
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827