Statistics

Statistics Archives for Fall 2017 to Spring 2018

A General Form of the Stam Classical Inequality for the Fisher Information

When: Thu, October 19, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Abram Kagan (STAT Program) - https://www.math.umd.edu/~akagan/
Abstract: http://www.math.umd.edu/~pjs/STATseminar/AbramKagan_AbstractUMD.pdf

Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

When: Thu, October 26, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Gang Li (Biostatistics and Biomathematics, UCLA) -
Abstract: Advancing medical informatics tools and high-throughput biological experimentation are making large-scale data routinely accessible to researchers, administrators, and policy-makers. This ``data deluge'' poses new challenges and critical barriers for quantitative researchers as existing statistical methods and software grind to a halt when analyzing these large-scale datasets, and calls for a need of methods that can readily fit large-scale data. In this talk I will present a new sparse Cox regression method for high-dimensional massive sample size survival data. Our method is an L0-based iteratively reweighted L2-penalized Cox regression, which inherits some appealing properties of both L0 and L2 penalized Cox regression while overcoming their limitations. We establish that it has an oracle property for selection and estimation and a grouping property for highly correlated covariates. We develop an efficient implementation for high-dimensional massive sample size survival data, which exhibits up to a 20-fold speedup over a competing method in our numerical studies. We also adapt our method to high-dimensional small sample size data. The performance of our method is illustrated using simulations and real data examples.

Autocorrelated kernel density estimation

When: Thu, November 9, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Chris Fleming (Biology, UMCP) -
Abstract: Non-parametric probability density function estimation is an important statistical problem in movement ecology, where researchers are interested in quantifying animal "space use" and delineating "home range" areas. The most common statistical approach to this problem is kernel density estimation, which traditionally assumes independently sampled data. Unfortunately, animal tracking data are invariably correlated in time (non-independent), as the continuity of movement dictates that each animal location is in close proximity to the next. Moreover, as GPS and battery technology improve, researchers are increasing their sampling rates commensurately, which increases the autocorrelation between sequential locations and further violates the assumption of independence. Here I describe the recent development of kernel density methods derived to accommodate autocorrelated data, which are currently being applied the field of movement ecology.

Efficiency Requires Innovation

When: Thu, November 30, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Abram Kagan (UMCP) -

No Calculation When Observation Can Be Made

When: Thu, February 8, 2018 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Dr. Tommy Wright (U.S. Census Bureau) -
Abstract: For use in connection with the general and complete observations that would be known from a full census, Kiaer (1895, 1897) presents a purposive “Representative Method” for sampling from a finite population to provide “…more penetrating, more detailed, and more specialized surveys…” Many credit this method with laying seeds for current sampling methods used in producing official social and economic statistics. At a time when just about all official statistics were produced by censuses, Kiaer had much opposition, especially from statistician von Mayr, who said (a translation), “…no calculations when observations can be made.”

Neyman (1934) brought probability to this Representative Method using stratified random sampling. Probability makes it possible to express uncertainty about the results from the Representative Method and to say how good the results are. Neyman presents details for the well-known and widely used optimal allocation of the fixed sample size among the various strata to minimize sampling error. When sample sizes are rounded to integers from Neyman’s allocation, minimum sampling error is not guaranteed. Wright (2012) improves Neyman’s result with a simple derivation obtaining exact results that always yield integer sample size allocations while minimizing sampling error. Wright (2014, 2016, 2017) obtains exact integer optimal allocation results when there are mixed constraints on sample sizes for each stratum or when there are desired precision constraints. With exact optimal allocation, we demonstrate a decrease in needed sample size for the same precision using 2007 Economic Census data in the sample design for part of the subsequent Service Annual Survey.

We conclude by calling on the phrase “…no calculation when observation can be made” to muse about current world-wide considerations to make greater use of data from additional sources (e.g., administrative records, commercial data, big data…) to produce official statistics.

Axioms of Dependence Measures; Energy, Matter, and Mind

When: Thu, April 19, 2018 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Gabor Szekely (National Science Foundation) -
Abstract: In the past 130 years many dependence measures have been introduced. One of the last ones, distance correlation, was introduced by the speaker in 2005.

Is there a universally acceptable system of axioms that helps to select the correlation for the 21st century? In this talk we propose four simple axioms for dependence measures and then discuss if classical and new measures of dependences satisfy them. A general framework connects Energy, Matter, and Mind. This is the starting point of a distance based topological data analysis.

Robust Regularized Regression with Applications to -Omics Studies

When: Thu, June 7, 2018 - 3:00pm
Where: Kirwan Hall 1313
Speaker: Gabriela Cohen-Freue (Univ. of British Columbia) -
Abstract: In many current applications scientists can easily measure a very large number of variables (for example, hundreds of protein levels) some of which are expected be useful to explain or predict a specific response variable of interest. These potential explanatory variables are most likely to contain redundant or irrelevant information, and in many cases, their quality and reliability may be suspect. We developed two penalized robust regression estimators using an elastic net penalty that can be used to identify a useful subset of explanatory variables to predict the response, while protecting the resulting estimations against possible aberrant observations in the data set. In this talk, I will present the new estimators and an algorithm to compute them. I will also illustrate their performances in a simulation study and a proteomics biomarker study of cardiac allograft vasculopathy. Joint work with Professor Matias Salibian-Barrera, Ezequiel Smucler (PDF), and David Kepplinger (PhD candidate) from UBC.