Abstract: Conformal prediction is a distribution-free method for uncertainty quantification that ensures finite sample guarantee. However, its validity relies on the assumption of data exchangeability. In this talk, I will introduce several conformal prediction approaches tailored for non-exchangeable data settings, including clustered data with missing responses, nonignorable missing data, and label shift data. To provide an asymptotic conditional coverage guarantee for a given subject, we propose constructing prediction regions by establishing the highest posterior density region of the target. This method is more accurate under complex error distributions, such as asymmetric and multimodal distributions, making it beneficial for personalized and heterogeneous scenarios. I will present some numerical results to illustrate their effectiveness. This is a joint work with Menghan Yi, Yingying Zhang and Yanlin Tang.
Abstract: Approximate message passing (AMP) emerges as an effective iterative algorithm for solving high-dimensional statistical problems. However, prior AMP theory, which focused mostly on high-dimensional asymptotics, fell short of predicting the AMP dynamics when the number of iterations surpasses o(log n / log log n) (with n the problem dimension). To address this inadequacy, this talk introduces a non-asymptotic framework towards understanding AMP. Built upon a new decomposition of AMP updates in conjunction with well-controlled residual terms, we lay out an analysis recipe to characterize the finite-sample convergence of AMP up to O(n / polylog(n)) iterations. We will discuss concrete consequences of the proposed analysis recipe in the Z2 synchronization problem; more specifically, we predict the behavior of randomly initialized AMP for up to O(n/poly(\log n)) iterations, showing that the algorithm succeeds without the need of a careful spectral initialization and also a subsequent refinement stage (as conjectured recently by Celentano et al.)
Abstract: Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance. This is based on joint works with Gareth Roberts (University of Warwick) and Giacomo Zanella (Bocconi University)
Abstract: Time series data arising in many applications nowadays are high-dimensional. A large number of parameters describe features of these time series. Sensible inferences on these parameters with limited data are possible if some underlying lower-dimensional structure is present. We propose a novel approach to modeling a high-dimensional time series through several independent univariate time series, which are then orthogonally rotated and sparsely linearly transformed. With this approach, any specified intrinsic relations among component time series given by a graphical structure can be maintained at all time snapshots. We call the resulting process an Orthogonally-rotated Univariate Time series (OUT). Key structural properties of time series such as stationarity and causality can be easily accommodated in the OUT model. For Bayesian inference, we put suitable prior distributions on the spectral densities of the independent latent times series, the orthogonal rotation matrix, and the common precision matrix of the component times series at every time point. A likelihood is constructed using the Whittle approximation for univariate latent time series. An efficient Markov Chain Monte Carlo (MCMC) algorithm is developed for posterior computation. We study the convergence of the pseudo-posterior distribution based on the Whittle likelihood for the model's parameters upon developing a new general posterior convergence theorem for pseudo-posteriors. We find that the posterior contraction rate for independent observations essentially prevails in the OUT model under very mild conditions on the temporal dependence described in terms of the smoothness of the corresponding spectral densities. In the course of establishing the result, we develop a new general theorem on contraction rate of a pseudo-posterior distribution that is potentially applicable in other situations. Through a simulation study, we compare the accuracy of estimating the parameters and identifying the graphical structure with other approaches. We apply the proposed methodology to analyze a dataset on different industrial components of the US gross domestic product between 2010 and 2019 and predict future observations.
Based on a collaboration with Arkaprava Roy, University of Florida, and Anindya Roy, University of Maryland-Baltimore County.
Abstract: In many applications the population means of geographically adjacent small areas exhibit a spatial variation. If the auxiliary variables employed in a small area model do not adequately account for the spatial pattern of the means, the residual variation will be absorbed into the random effects component of the linking part of the model. As a result, the independent and identical distributional assumptions on random effects of the traditional Fay-Herriot model will fail. Additionally, limited resources often prevent many subpopulations from being sampled, resulting in non-sampled small areas. In fact, sometimes by the design of a survey, the survey provides aggregated statistics for an outcome of interest may be only at a higher level, and it provides no data to construct direct estimates of characteristics for the lower level subpopulations. For example, a domain in a survey may be a group of counties and the survey collects data from that domain to provide a direct estimate for the group. Often useful covariates for individual counties are available from administrative records or other surveys. Covariates data can be integrated with aggregated statistics for the primary outcome through innovative small area estimation methodology to leverage aggregated outcome data to produce better estimates and associated measures of uncertainty for the disaggregated subpopulation means. To produce small area estimates for non-sampled domains or for lower level of geography we generalize the celebrated Fay-Herriot model, which has been extensively used for several decades by many National Statistical Offices around the world, to produce reliable small area statistics. To address our challenge we employ a Bayesian approach based on a number of popular spatial random-effect models. Effectiveness of our Bayesian spatial solution is assessed based on simulated and real data. Specifically, we examine predictions of statewide four-person family median incomes based on the 1990 Current Population Survey and the 1980 Census for the United States of America. Under mild conditions, we establish the propriety of the posterior distributions for various spatial models for a useful class of improper prior densities on model parameters.
Abstract: Variational inference (VI) has emerged as a widely used alternative to Markov Chain Monte Carlo (MCMC) for approximating complex probability densities arising in Bayesian models. While MCMC enjoys well-established theoretical properties, the theoretical foundations of VI remain less explored. In this talk, we present recent advances in the theoretical understanding of VI. Specifically, we demonstrate that point estimators derived from VI can achieve consistency and optimal rates of convergence under a broad set of conditions, applicable to a wide range of problems. Moreover, we prove a Bernstein-von Mises theorem for the mean-field variational approximation, demonstrating that the resulting point estimator not only attains the root-n rate of convergence but also achieves the same optimal estimation efficiency as maximum likelihood estimation. Further, we examine the role of the evidence lower bound (ELBO) in model selection, demonstrating that for regular parametric models, ELBO asymptotically agrees with the widely used Bayesian Information Criterion (BIC); while for finite mixture models, a representative class of singular models, ELBO correctly identifies the number of components where BIC fails. These findings demonstrate that variational inference provides a computationally efficient alternative to conventional approaches for both optimal parameter estimation and consistent model selection.
Abstract: In network settings, interference between units makes causal inference more challenging as outcomes may depend on the treatments received by others in the network. Typical estimands in network settings focus on treatment effects aggregated across individuals in the population. We propose a framework for estimating node-wise counterfactual means, allowing for more granular insights into the impact of network structure on treatment effect heterogeneity. We develop a doubly robust and non-parametric estimation procedure, KECENI (Kernel Estimation of Causal Effect under Network Interference), which offers consistency and asymptotic normality under network dependence. The utility of this method is demonstrated through an application to microfinance data, revealing the impact of network characteristics on treatment effects.
Abstract: From clinical trials to corporate strategy, randomized experiments are a reliable methodological tool for estimating causal effects. In recent years, there has been a growing interest in causal inference under interference, where treatment given to one unit can affect outcomes of other units. While the literature on interference has focused primarily on unbiased and consistent estimation, designing randomized network experiments to insure tight rates of convergence is relatively under-explored for many settings.
In this talk, we study the problem of direct effect estimation under interference. Here, the interference between experimental subjects is captured by a network and the experimenter seeks to estimate the direct effect, which is the difference between the outcomes when (i) a unit is treated and its neighbors receive control and (ii) the unit and its neighbors receive control. We present a new experimental design under which the normalized variance of a Horvitz—Thompson style estimator is bounded as $n * Var
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827