Abstract: Expected shortfall, measuring the average outcome (e.g., portfolio loss) above a given quantile of its probability distribution, is a common financial risk measure. The same measure can be used to characterize treatment effects in the tail of an outcome distribution, with applications ranging from policy evaluation in economics and public health to biomedical investigations. Expected shortfall regression is a natural approach of modeling covariate-adjusted expected shortfalls. Because the expected shortfall cannot be written as a solution of an expected loss function at the population level, computational as well as statistical challenges around expected shortfall regression have led to stimulating research. We discuss some recent developments in this area, with a focus on a new optimization-based semiparametric approach to estimation of conditional expected shortfall that adapts well to data heterogeneity with minimal model assumptions. The talk is based on joint work with Yuanzhi Li and Shushu Zhang.
Abstract: Given a dataset comprising a single network, we consider inference on a parameter selected from the data. We focus on the setting where the selected parameter is a linear combination of the mean connectivities within and between estimated communities. Inference in this setting poses a challenge, since the communities are themselves estimated from the data. Furthermore, since only a single realization of the network is available, sample splitting is not possible. We show that it is possible to split a single realization of a network with $n$ nodes into two (or more) networks involving the same $n$ nodes; the first network can be used to select a data-driven parameter, and the second to conduct inference on that parameter. In the case of weighted networks with Poisson or Gaussian edges, we obtain two independent realizations of the network; by contrast, in the case of Bernoulli edges, the two realizations are dependent, and so extra care is required. We establish the theoretical properties of our estimators, in the sense of confidence intervals that attain the nominal (selective) coverage, and demonstrate their utility in numerical simulations and in application to a dataset representing the relationships among dolphins in Doubtful Sound, New Zealand. This is joint work with Ethan Ancell and Daniela Witten of the University of Washington.
Abstract: Evaluating and validating the performance of prediction models is a crucial task in statistics, machine learning, and their diverse applications, including precision medicine. However, developing robust prediction performance measures, particularly for time-to- event data, poses unique challenges. In this talk, I will highlight how conventional performance metrics for time-to-event data, such as the C Index, Brier Score, and time- dependent AUC, may yield unexpected results when comparing prediction models/algorithms. I will then introduce a novel time-dependent pseudo R-squared measure and demonstrate its utility as a prediction performance measure for both uncensored and right-censored time-to-event data. Additionally, I will discuss its extension to time-dependent prediction performance measures and to competing risks scenarios. Its effectiveness will be showcased through simulations and real-world examples.
Abstract: We present a scalable manifold learning framework motivated by the challenge of estimating activation manifolds from functional magnetic resonance imaging data in the Human Connectome Project. Our key contribution is an efficient estimation strategy for heat kernel Gaussian processes within exponential family models. The method is designed to handle very large sample sizes while preserving the intrinsic geometry of the data. By introducing a reduced-rank approximation of the graph Laplacian transition matrix and employing a truncated singular value decomposition for eigenpair computation, we reduce computational complexity from O(n^3) to nearly O(n). Numerical experiments demonstrate that the proposed approach achieves both scalability and improved accuracy, making it well-suited for large-scale manifold learning tasks in complex biomedical and other data domains. This is joint work with Junhui He, Guoxuan Ma and Ying Yang.
Abstract: Unsupervised node clustering (or community detection) is a classical graph learning task. In this work, we study algorithms that exploit the local geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and its line graph, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis.
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827