Where: Kirwan Hall 1313

Speaker: Abram Kagan (Dept. of Math. (Statistics program)) - http://math.umd.edu/~amk

Where: Kirwan Hall 1313

Speaker: Dr. Jing Qin (National Institutes of Health) -

Where: Kirwan Hall 1313

Speaker: Yaakov Malinovsky (Dept. of Math. and Stat., UMBC) -

Where: Kirwan Hall 1313

Speaker: Prof. Ciprian Crainiceanu (Dept. of Biostatistics, Johns Hopkins University) -

Abstract: Wearable and Implantable Technology (WIT) is rapidly changing the Biostatistics data analytic landscape due to their reduced bias and measurement error as well as to the sheer size and complexity of the signals. In this talk I will review some of the most used and useful sensors in Health Sciences and the ever-expanding WIT analytic environment. I will describe the use of WIT sensors including accelerometers, heart monitors, glucose monitors and their combination with ecological momentary assessment (EMA). This rapidly expanding data eco-system is characterized by multivariate densely sampled time series with complex and highly non-stationary structures. I will introduce an array of scientific problems that can be answered using WIT and I will describe methodsdesigned to analyze the WIT data from the micro- (sub-second-level) to the macro-scale (minute-, hour- or day-level) data.

Where: Kirwan Hall 1313

Speaker: Dr. Ruth Pfeiffer (National Cancer Institute, NIH) -

Abstract: Much research seeks biomarkers for diagnosing disease and understanding disease etiology.

As high-throughput technologies allow measuring multiple markers simultaneously, strategies for combining markers are needed, particularly if no single marker is highly discriminating. Statistical procedures to combine information from multiple markers need to account for correlations and for left and/or right censoring of the markers due to lower or upper limits of detection of the laboratory assays. We thus extend dimension reduction approaches, specifically likelihood-based sufficient dimension reduction, to regression or classification with censored predictors. Using an expectation maximization (EM) algorithm, we find linear combinations that contain all or most of the information contained in correlated markers for modeling and prediction of an outcome variable, while accounting for left and right censoring due to detection limits. We also allow for selection of important variables through penalization. We assess the performance of our methods extensively in simulations and apply them to data from a study conducted to assess associations of 47 inflammatory markers and lung cancer risk and build prediction models.

This is joint work with Diego Tomassi, Liliana Forzani and Efstathia Bura

Where: Kirwan Hall 1313 (notice change of time)

Speaker: Jason Klusowski (Dept. of Statistics, Rutgers University) -

Abstract: `It has been experimentally observed in recent years that

multi-layer artificial neural networks have a surprising ability to

generalize, even when trained with far more parameters than

observations. Is there a theoretical basis for this? The best available

bounds on their metric entropy and associated complexity measures are

essentially linear in the number of parameters, which is inadequate to

explain this phenomenon. Here we examine the statistical risk (mean

squared predictive error) of multi-layer networks with $\ell^1$-type

controls on their parameters and with ramp activation functions (also

called lower-rectified linear units). In this setting, the risk is shown

to be upper bounded by $[(L^3 \log d)/n]^{1/2}$, where $d$ is the input

dimension to each layer, $L$ is the number of layers, and $n$ is the

sample size. In this way, the input dimension can be much larger than

the sample size and the estimator can still be accurate, provided the

target function has such $\ell^1$ controls and that the sample size is

at least moderately large compared to $L^3\log d$. The heart of the

analysis is the development of a sampling strategy that demonstrates the

accuracy of a sparse covering of deep ramp networks. Lower bounds show

that the identified risk is close to being optimal. This is joint work

with Andrew R. Barron.''

Where: Kirwan Hall 1313

Speaker: Dr. Gregory Hader (National Cancer Institute, NIH) -

Abstract:

Abstract

While the use of group testing as a tool for estimation has been on the rise in recent decades, classical problems such as the large bias of the maximum likelihood estimator continue to hinder the implementation of such methods. This has led to the development of many estimators minimizing bias and, most recently, an unbiased estimator based on sequential binomial sampling. Previous research, however, has focused heavily on the simple case where no misclassification is assumed and only one trait is to be tested. In this talk, we consider the problem of unbiased estimation in these broader areas, giving constructions of such estimators for several cases. We show that, outside of the standard case addressed previously in the literature, it is impossible to find any proper unbiased estimator, that is, an estimator giving only values in the parameter space. This is shown to hold generally under any binomial or multinomial sampling plans.

Where: Kirwan Hall 1313

Speaker: Eric Slud (Dept. of Mathematics (Statistics Program)) -

Abstract: Bioequivalence studies are an essential part of the evaluation of generic drugs. The most common in-vivo bioequivalence (BE) study design is the two-period two-treatment open label crossover design, with a metric of bioavailability such as the log of an approximate integral of the measured concentration of the drug in the blood (log AUC). The observation of interest for each subject is the difference between the measurement in the first and second period of the crossover. When this quantity is assumed approximately normally distributed, the sample size for BE studies using the "Two One-sided Tests" approach is a function of the assumed mean difference, the assumed variance, equivalence margins, type I error rate, and desired power. Since BE studies are often rather small, there is a serious possibility that they are under-powered when the assumed variance turns out to be too small, and it would be preferable to have a blinded study design based on re-estimating the sample-size using only a preliminary estimate of variance calculated without unmasking the treatment labels. However, up to this time there has not been such a two-stage study design guaranteed to maintain experimentwise type I error rate in small samples, apart from inefficient procedures related to Stein's 1945 two-stage procedure.

In the research described in this talk, expanding on a portion of Meiyu Shen's 2015 UMD thesis, a two-stage sample-size re-estimation design will be presented. The idea, for second-stage sample size expressed as a function of first-stage estimated sample variance, is to calculate the second-stage rejection threshold in such a way that the experimentwise type I error probability maximized over the (unknown) true variance is equal to the prescribed alpha (usually 0.05). This idea is shown to be computationally and practically feasible in the setting of BE studies.

This work is joint with Meiyu Shen and Estelle Russek-Cohen of FDA.

Where: Kirwan Hall 1313

Speaker: Abram Kagan (UMCP) -

Abstract: Properties of a measure of dependence will be presented that, in my opinion, should be satisfied by any natural measure of dependence.

The main goal is construction of a calibrated scale of dependence between random elements X and Y that is based on the dimension of the range of the projector of the subspace L^{2}(X) of L^{2}(X, Y) into L^{2}(Y).

For independent X, Y the range is one-dimensional and this property is characteristic of independence.

Where: Kirwan Hall 1313

Speaker: Prof. Ron Kenett ( KPA Ltd and the Samuel Neaman Institute, Technion, Israel) -

Abstract: The question of reproducibility of research outcomes is discussed now in the open press with a potential negative

impact on science as a whole. In dealing with this question, from a statistical view point, several methodological

advances have been proposed (like FDR) and several clarification attempts have been published (like the ASA

statement on the p value). These attempts seem to only partially address the rising concerns of the public and

research funding agencies.

Kenett and Shmueli in Clarifying the terminology that describes scientific reproducibility, Nature Methods, 12(8), p

699, 2015, review the terminology used in this debate and refer to generalizability, as a dimension that can clarify

what are research claims that should be scrutinize as reproducible. Generalizability is one of the eight dimensions

of the information quality (InfoQ) framework presented in Kenett and Shmueli, On information quality: The

Potential of Data and Analytics to Generate Knowledge, John Wiley and Sons, 2016.

In this talk, we expand on the idea of generalizability of research findings by referring to Type S errors proposed in

Gelman and Carlin (2014) [Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors,

Perspectives on Psychological Science, Vol. 9(6), pp. 641–651]. The talk will first discuss methods for setting up a

boundary of meaning used in generalizing research findings. It will then show how Type S errors and directional

FDR methods fit with this generalizability approach. An example from research in localized colon cancer

diagnostics will be used to demonstrate the approach.

Where: Kirwan Hall 1313

Speaker: Wojtek Czaja (Dept. of Mathematics, UMCP) -

Abstract: In recent years machine learning with its focus on predictive and

generatve abilities of learning algorithms became a focus of attention of

researchers across many fields, incuding mathematics. In this talk we will

present some of the aspects of mathematical contributions to machine

learning, devoting our attention to approximation theory, optimization,

and convolutional networks.

Where: Kirwan Hall 1313

Speaker: Abram Kagan (Dept. of Mathematics (Statistics program)) - http://math.umd.edu/~amk

Abstract: A parametric family of distributions of a pair (X, Y) of random elements (X, Y) is called marginal-ancillary if the marginal distributions of X and Y are parameter free. Thus all the information on the parameter is contained in the dependence between X and Y. A lower bound for the Fisher information on the parameter is obtained in the case when the parameter is the correlation coefficient.

Where: Kirwan Hall 1313

Speaker: Dr. Tinghui Yu (MedImmune) -

Abstract: The quality of an assay/survey with categorical output is usually characterized by its accuracy (bias) and precision (variation). To assess these parameters, one needs to perform a study testing a set of properly selected samples repeatedly under different conditions. A generalized linear mixed model (GLMM) can be fitted to the test results, providing control over the correlation structure within and between each design factor of concern. However, interpretation of the resulting GLMM, especially for the random effects, is not straightforward bacause the random effects are usually defined through a non-linear transformation (i.e., a link function). We introduced a new statistic to measure the variation in categorical data generated with multiple levels of control factors. The new method is based on the average agreement between the observed outcomes and hence offers intuitive probabilistic interpretations. It can be shown that this new statistic is closely related to the GLMM. We will also demonstrate the new method through simulations and examples with applications to clinical diagnostics.

Where: Kirwan Hall 1313

Speaker: Prof. Soheil Feizi (Dept. of Computer Sci., Univ. of Maryland) -

Abstract: Generative Adversarial Networks (GANs) have become a popular method to learn a probability model from data. In this talk, I will provide an understanding of some of the basic issues surrounding GANs including their formulation, generalization and stability on a simple benchmark where the data has a high-dimensional Gaussian distribution. Even in this simple benchmark, the GAN problem has not been well-understood as we observe that existing state-of-the-art GAN architectures may fail to learn a proper generative distribution owing to (1) stability issues (i.e., convergence to bad local solutions or not converging at all), (2) approximation issues (i.e., having improper global GAN optimizers caused by inappropriate GAN's loss functions), and (3) generalizability issues (i.e., requiring large number of samples for training). In this setup, we propose a GAN architecture which recovers the maximum-likelihood solution and demonstrates fast generalization. Moreover, we analyze global stability of different computational approaches for the proposed GAN and highlight their pros and cons. Finally, we outline an extension of our model-based approach to design GANs in more complex setups than the considered Gaussian benchmark.

Where: Kirwan Hall 1313

Speaker: Dr. Yong Chen (Dept. of Biostatistics, Epidemiology and Informatics, Univ. of Pennsylvania) -

Abstract: Pleiotropic and polygenic effects, where the former means genetic locus affects multiple

phenotypes, and the latter refers to many loci affecting one trait, offer significant insights in

understanding the complex genotype-phenotype relationship. The increasing availability of

medical and genomic data provide the opportunity to uncover such relationship through joint

modeling multiple phenotypes and genetic variants simultaneously. In this talk, I will share a

few recently developed statistical models for detecting pleiotropic and polygenic effects. I will

discuss some key techniques and considerations on modeling large-scale genetic information. I

will also share our analyses on a large-scale biobank linked electronic health record (EHR) data,

the Penn Medicine Biobank (PMBB), for studying complex genetic architectures and their

impacts on multiple phenotypes.

Where: Kirwan Hall 1313

Speaker: Prof. Eric Slud (Dept. of Mathematics (Statistics Program)) -

Abstract: Using data from the Current Population Survey, we consider model-based estimates of population subgroups in different employment categories in two successive months (June and July 2017), cross-classified by education, age, and state. These cross-classified population counts are often rather small, too small to be well estimated by design-based survey methods, but seem amenable to `small area estimation' models in which state- and other subgroup-effects are viewed as random. The random effects would be viewed differently in a Bayesian analysis and a frequentist one, although each of these different data analysis approaches provides useful information to the other. The talk will discuss computation, display and interpretation of model results, with particular reference to packages and computational tools in R. The theme of the data analysis is the contrast (by likelihood and prediction metrics) between fixed and random-effect models for area-level intercept effects.

Where: Kirwan Hall 1313

Speaker: Paul J. Smith (STAT Program) -

Where: Kirwan Hall 1313

Speaker: Pro. Debasis Sengupta (Indian Statistical Institute) -

Abstract: The close connection between global temperature variation and atmospheric carbon dioxide concentration has been central to the issue of climate change. The lag/lead between sets of longitudinal data on the two variables has implications for the causality of that connection. We consider this problem as one of curve registration. Most of the available solutions for this problem have been designed for the growth data application, where the number of observations is small and the number of replicates is large. We argue that a different emphasis is needed for the paleoclimatic application. We provide a new method, which is able to pool local information without smoothing and to match sharp landmarks without manual identification. We prove the consistency of the proposed method under fairly general conditions. Simulation results show superiority of the performance of the proposed method over two existing methods. Use of the proposed method to Antarctic ice core data leads to some interesting conclusions.

Where: Kirwan Hall 1313

Speaker: Prof. Prakash Narayan (Dept. of Comp. and Electrical Engineering, Univ. of Maryland) -

Abstract: This talk is based on joint work with Ph.D. student Ajaykrishnan Nageswaran.

A user's data is represented by a finite-valued random variable.

Given a function of the data, a querier is required to recover,

with at least a prescribed probability, the value of the function

based on a query response provided by the user. The user devises

the query response, subject to the recoverability requirement,

so as to maximize privacy of the data from the querier.

Privacy is measured by the probability of error incurred

by the querier in estimating the data from the query response.

We analyze single and multiple independent query responses,

with each response satisfying the recoverability requirement,

that provide maximum privacy to the user. Achievability schemes

with explicit randomization mechanisms for query responses are given

and their privacy compared with converse upper bounds.

More stringent forms of privacy, viz. predicate privacy and

list privacy will also be mentioned.

==============================

Where: Kirwan Hall 1313

Speaker: Dr. Amita Pal (Indian Statistical Institute) -

Abstract: Statistical Machine Learning involves an algorithmic approach, derived from statistical models, for solving certain problems that arise in the domain of Artificial Intelligence, that can be implemented through computers. Machine learning algorithms build a mathematical model of sample data, known as "training data", in order to make predictions or decisions. Depending on whether training data is labeled/ unlabeled, a variety of supervised/unsupervised Statistical Machine Learning methods are available. An overview of the most widely-used ones will be provided in this talk, and application to the problems of automatic speaker recognition (ASR) and content-based image retrieval (CBIR) will be briefly described.

Where: Kirwan Hall 1313

Speaker: Dr. Song Yang (National Heart, Lung, and Blood Institute, NIH) -

Abstract: In clinical trials the primary outcome is often a composite one, defined as time to the first of two or more types of clinical events,

such as cardiovascular death, a terminal event, and heart failure hospitalization, a non-terminal event. Thus if a patient experiences both types of events,

then the terminal event after a non-terminal event does not contribute to the primary outcome, even though the terminal event is more important than the

non-terminal event. If there are substantial number of patients who experience multiple events, the power of the test for treatment effect may be reduced due

to omission of some of the available data. In the win ratio approach, priorities are given to the clinically more important events, and potentially all available data are used. However, the win ratio approach may have low power in detecting a treatment effect if the effect is predominantly on the non-terminal events. We propose

event-specific win ratios obtained separately on the terminal and non-terminal events. These ratios can then be used to form global tests such as a linear combination

test, the maximum test, or a Chi-square test. In simulations these tests often improve the power of the original win ratio test. Furthermore, when the

the terminal and non-terminal events experience differential treatment effects, the new tests often improve the power of the log-rank test for the

composite outcome. Thus whether the treatment effect is primarily on the terminal events or the non-terminal events, the new tests based on the event-specific win ratios can be useful alternatives for testing treatment effect in clinical trials with time-to-event outcomes when different types of events are present.

We illustrate the new tests with the primary outcome in the trial Aldosterone Antagonist Therapy for Adults With

Heart Failure and Preserved Systolic Function (TOPCAT), where the new tests all reject the null hypothesis of no treatment effect while the composite outcome approach used in TOPCAT did not.