Abstract: We consider the problem of graph matchability in non-identically distributed networks. In a general class of edge-independent networks, we demonstrate that graph matchability can be lost with high probability when matching the networks directly. We further demonstrate that under mild model assumptions, matchability is almost perfectly recovered by centering the networks using Universal Singular Value Thresholding before matching. These theoretical results are then demonstrated in both real and synthetic simulation settings. We also recover analogous core-matchability results in a very general core-junk network model, wherein some vertices do not correspond between the graph pair.
Abstract: Abstract: Data analysis should be adaptive, and researchers should be able to modify their analyses based on data exploration and previous analysis. Holdout methods allow for this, however multiple reuse of the holdout set can lead to incorrect conclusions. Researchers have previously shown that holdout sets can be reused for adaptive analysis using differential privacy techniques. In this talk, I present an extension of the research from binomial response variable to continuous response for potential applications in my research at the Institute for Defense Analyses (IDA).
IDA is a not-for-profit company that runs three Federally Funded Research and Development Centers (FFRDCs). FFRDCs are centers that are sponsored by, and conduct research for, various government agencies. Graduate students in STEM fields have likely heard of some of the more well-known FFRDCs without ever learning the term âFFRDCâ. For example: the âNational Labsâ, such as Los Alamos National Laboratory and Oak Ridge National Laboratory, are FFRDCs sponsored by the Department of Energy. The public-private partnerships offered by FFRDCs offer unique opportunities to meet the research needs of government organizations in challenging, cooperative environments.
Abstract: An interplay between coherence and logistic regression is discussed. Inter-
action terms expressed as products of covariates may prove useful in logistic
regression for binary time series, even when their factors are not significant.
To identify potentially useful interaction terms, a graphical spectral tool, a
function of lag or delay referred to as residual coherence, is introduced. Potentially useful interaction terms are identified by the size or prominence of
their residual coherence. Instead of direct significance testing in terms of the
residual coherence, the identified covariates are tested for their significance
within logistic regression.
Abstract: Graph embeddings, a class of dimensionality reduction techniques designed for relational data, have proven useful in exploring and modeling network structure. Most dimensionality reduction methods allow out-of-sample extensions, by which an embedding can be applied to observations not present in the training set. Applied to graphs, the out-of-sample extension problem concerns how to compute the embedding of a vertex that is added to the graph after an embedding has already been computed. In this talk, we will consider the out-of-sample extension problem for two graph embedding procedures: the adjacency spectral embedding and the Laplacian spectral embedding. In both cases, we prove that when the underlying graph is generated according to a latent space model called the random dot product graph, which includes the popular stochastic block model as a special case, an out-of-sample extension based on a least-squares objective obeys a central limit theorem. Our results also yield a convenient framework in which to analyze trade-offs between estimation accuracy and computational expenses, which we will explore briefly.
Abstract: The development of models for multiple heterogeneous network data is of critical importance both in statistical network theory and across multiple application domains. Although single-graph inference is well-studied, multiple graph inference is largely unexplored, in part because of the challenges inherent in appropriately modeling graph differences and yet retaining sufficient model simplicity to render estimation feasible. The common subspace independent-edge (COSIE) multiple random graph model addresses this gap, by describing a heterogeneous collection of networks with a shared latent structure on the vertices but potentially different connectivity patterns for each graph. The COSIE model is both flexible to account for important graph differences and tractable to allow for accurate spectral inference. In both simulated and real data, the model can be deployed for a number of subsequent network inference tasks, including dimensionality reduction, classification, hypothesis testing, and community detection.
Abstract: Determining how certain properties are related to other properties is fundamental to scientific discovery; further investigations into the geometry of the relationship and future predictions are warranted only if two properties are significantly related. To better discover any type of relationship underlying paired sample data, we introduce the multiscale graph correlation (MGC), which combines distance correlation, the locality principle, and smoothed maximum to yield a new and powerful dependency measure. We prove that MGC is consistent for testing independence, enjoys a number of desirable theoretical properties, exhibits empirical power advantages against a wide range of nonlinear and high-dimensional dependencies, and can be efficiently implemented and utilized for real data exploration.
Abstract: We consider sparse Bayesian estimation in the classical multivariate linear regression model with p regressors and q response variables. In univariate Bayesian linear regression with a single response y, shrinkage priors which can be expressed as scale-mixtures of normal densities are a popular approach for obtaining sparse estimates of the coefficients. In this paper, we extend the use of these priors to the multivariate case to estimate a p times q coefficients matrix B. Our method can be used for any sample size n and any dimension p, and moreover, we show that the posterior distribution can consistently estimate B even when p grows at nearly exponential rate with the sample size. Our method's finite sample performance is demonstrated through simulations and data analysis.
Abstract: When searching for gene pathways leading to speciï¬c disease outcomes, additional information on gene characteristics is often available that may facilitate to diï¬erentiate genes related to the disease from irrelevant background when connections involving both types of genes are observed and their relationships to the disease are unknown. We propose method to single out irrelevant background genes with the help of auxiliary information through a logistic regression, and cluster relevant genes into cohesive groups using the adjacency matrix. Expectationâmaximization algorithm is modiï¬ed to maximize a joint pseudo-likelihood assuming latent indicators for relevance to the disease and latent group memberships as well as Poisson or multinomial distributed link numbers within and between groups. A robust version allowing arbitrary linkage patterns within the background is further derived. Asymptotic consistency of label assignments under the stochastic blockmodel is proven. Superior performance and robustness in ï¬nite samples are observed in simulation studies. The proposed robust method identiï¬es previously missed gene sets underlying autism related neurological diseases using diverse data sources including de novo mutations, gene expressions, and proteinâprotein interactions. Besides, we further proposed integrative network analysis framework by combining likelihood or pseudo-likelihood of heterogeneous network data. For example, in studying gene expression and protein-protein interaction data, when the cluster structure is illustrated in the mean values of gene expression, empirical Bayesian hierarchical model is combined with stochastic block model to identify functional groups. In analyzing protein-protein interaction and gene ontology data, correlation coefficient matrix with blocked structure is combined with stochastic block model to identify protein complex. Asymptotic consistency of the group membership estimates is proven. Superior performances of the integrative methods compared to methods using single data source are observed in simulation studies and empirical guidelines in the choice of integrative analysis vs separate analysis are provided.
Abstract: Wind tunnel tests are crucial to the design of tall structures. Scale models
are outfitted with pressure taps at many locations of interest, such as the
center of the roof. Each tap measures pressure at one location for the
duration of the test. Since the tap measurement is recorded at regular time
intervals, the data produced form a regular time series. Wind engineers are
typically concerned with very high and very low (suction) pressures.
Peaks-over-threshold (POT) extreme value models are one of two main approaches
used by wind engineers for these data. However, POT models require the choice
of a threshold, which can influence the final results, sometimes substantially,
because the threshold choice controls the data that enter into the analysis.
In this talk a method for combining results from multiple thresholds is
considered, thereby eliminating the need to choose only one. The focus is on
estimating the distribution of the maximum or minimum value in wind tunnel
tests. The new method is compared to several techniques for choosing a single
threshold using a large collection of pressure series from wind tunnel tests.
The comparison shows that choosing a single threshold underestimates the
uncertainty associated with predicting a future peak value.
Abstract: In this talk, I present computational methodologies for extracting dynamic neural functional networks that underlie behavior. These methods aim at capturing the sparsity, dynamicity and stochasticity of these networks, by integrating techniques from high-dimensional statistics, point processes, state-space modeling, and adaptive filtering. I demonstrate their utility using several case studies involving auditory processing, including 1) functional auditory-prefrontal interactions during attentive behavior in the ferret brain, 2) network-level signatures of decision-making in the mouse primary auditory cortex, and 3) cortical dynamics of speech processing in the human brain.
4176 Campus Drive - William E. Kirwan Hall
College Park, MD 20742-4015
P: 301.405.5047 | F: 301.314.0827