Abstract: Unlike standard tasks, survival analysis requires modeling incomplete data, such as right-censored data, which must be treated with care. While deep neural networks excel in traditional supervised learning, it remains unclear how to best utilize these models in survival analysis. A key question asks which data-generating assumptions of traditional survival models should be retained and which should be made more flexible via the function-approximating capabilities of neural networks. In addition, most of these methods are difficult to interpret and mathematical understanding of them is lacking. In this talk, we explore these issues from two directions. First, we study the partially linear Cox model, where the nonlinear component of the model is implemented using a deep neural network. The proposed approach is flexible and able to circumvent the curse of dimensionality, yet it facilitates interpretability of the effects of treatment covariates on survival. Next, we introduce a Deep Extended Hazard (DeepEH) model to provide a flexible and general framework for deep survival analysis. The extended hazard model includes the conventional Cox proportional hazards  and accelerated failure time models as special cases, so DeepEH subsumes the popular Deep Cox proportional hazard (DeepSurv) and Deep Accelerated Failure Time (DeepAFT) models. We provide theoretical support for the proposed models, which underscores the attractive feature that deep learning is able to detect low-dimensional structure of data in high-dimensional space. Numerical experiments further provide evidence that the proposed methods outperform existing statistical and deep learning approaches to survival analysis. Â
*Based on Joint work with Qixian Zhong (Xiamen University) and Jonas Mueller (Clean Lab)
Abstract: Recent advances in data collection technology have multiplied the availability of network data, leading to not only larger and more complex networks but also to instances where independent network samples are collected. In such data sets, a network serves as the basic data object, and they are increasingly common in neuroscience and genetics. When analyzing these data, a fundamental scientific question of interest is to understand how the subject-level network connectivity changes as a function of clinical characteristics. In this talk, we propose a new network response model framework, in which the networks are treated as responses and the network-level covariates as predictors. Under the proposed framework, we discuss model identifiability, estimation and theoretical properties. Finally, we present our findings from the analyses of three resting-state and task-related neuroimaging studies. Â
Abstract: Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data like images and texts. Ideally, such a representation should efficiently capture non-spurious features of the data. It shall also be disentangled so that we can interpret what feature each of its dimensions captures. However, these desiderata are often intuitively defined and challenging to quantify or enforce.
In this talk, we take on a causal perspective of representation learning. We show how desiderata of representation learning can be formalized using counterfactual notions, enabling metrics and algorithms that target efficient, non-spurious, and disentangled representations of data. We discuss the theoretical underpinnings of the algorithm and illustrate its empirical performance in both supervised and unsupervised representation learning.
This is joint work with Kartik Ahuja, Yoshua Bengio, Michael Jordan, Divyat Mahajan, and Amin Mansouri: https://arxiv.org/abs/2109.03795 https://arxiv.org/abs/2209.11924 https://arxiv.org/abs/2310.02854
Abstract: Network autoregressive models seek to model peer effects such as contagion and interference, in which node-level responses or behaviors may influence one another. These models are frequently deployed by practitioners in sociology and econometrics, typically in the form of linear-in-means models, in which node-level covariates and local averages of responses are used as predictors. In highly structured networks, previous work has shown that peer effects in linear-in-means models are colinear with other regression terms, and thus cannot be estimated, but this collinearity is widely believed to be ignorable, as peer effects are typically identified in empirical networks. In this work, we show a concerning negative result: under linear-in-means models, when node-level covariates are independent of network structure, peer effects become increasingly collinear with other regression terms as the network size (i.e., number of nodes) grows, and are inestimable asymptotically. We also show a narrow positive result: under certain latent space network models, some peer effects remain identified as the network size grows, albeit under rather stringent conditions. Our results suggest that linear models for peer effects are appropriate in far fewer settings than was previously believed.
Abstract: Predictive inference under a general regression setting is gaining more interest in the big-data era. In terms of going beyond point prediction to develop prediction intervals, two main threads of development are Conformal Prediction and Model-free Prediction. Recently, Chernozhukov et al. [2021] proposed a new conformal prediction approach exploiting the same uniformization procedure as in the Model-free Bootstrap of Politis [2015]. Hence, it is of interest to compare and further investigate the performance of the two methods. In the paper at hand, we contrast the two approaches via theoretical analysis and numerical experiments with a focus on conditional coverage of prediction intervals. We discuss suitable scenarios for applying each algorithm, underscore the importance of conditional vs. unconditional coverage, and show that, under mild conditions, the Model-free bootstrap yields prediction intervals with guaranteed better conditional coverage compared to quantile estimation. We also extend the concept of ‘pertinence’ of prediction intervals in Politis [2015] to the nonparametric regression setting, and give concrete examples where its importance emerges under finite sample scenarios. Finally, we define the new notion of ‘conjecture testing’ that is the analog of hypothesis testing as applied to the prediction problem.
Abstract: The Inverse-Wishart (IW) distribution is a standard and popular choice of priors for covariance matrices and has attractive properties such as conditional conjugacy. However, the IW family of priors has crucial drawbacks, including the lack of effective choices for non-informative priors. Several classes of priors for covariance matrices that alleviate these drawbacks, while preserving computational tractability, have been proposed in the literature. These priors can be obtained through appropriate scale mixtures of IW priors. However, the high-dimensional posterior consistency of models which incorporate such priors has not been investigated. We address this issue for the multi-response regression setting ( q responses, n samples) under a wide variety of IW scale mixture priors for the error covariance matrix. Posterior consistency and contraction rates for both the regression coefficient matrix and the error covariance matrix are established in the "large q , large n " setting under mild assumptions on the true data-generating covariance matrix and relevant hyperparameters. In particular, the number of responses q=q_n is allowed to grow with n , but with q_n = o(n). Also, some results related to the inconsistency of the posterior are provided.
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827