Statistics Archives for Academic Year 2018

A General Form of the Stam Classical Inequality for the Fisher Information

When: Thu, October 19, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Abram Kagan (STAT Program) -

Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

When: Thu, October 26, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Gang Li (Biostatistics and Biomathematics, UCLA) -
Abstract: Advancing medical informatics tools and high-throughput biological experimentation are making large-scale data routinely accessible to researchers, administrators, and policy-makers. This ``data deluge'' poses new challenges and critical barriers for quantitative researchers as existing statistical methods and software grind to a halt when analyzing these large-scale datasets, and calls for a need of methods that can readily fit large-scale data. In this talk I will present a new sparse Cox regression method for high-dimensional massive sample size survival data. Our method is an L0-based iteratively reweighted L2-penalized Cox regression, which inherits some appealing properties of both L0 and L2 penalized Cox regression while overcoming their limitations. We establish that it has an oracle property for selection and estimation and a grouping property for highly correlated covariates. We develop an efficient implementation for high-dimensional massive sample size survival data, which exhibits up to a 20-fold speedup over a competing method in our numerical studies. We also adapt our method to high-dimensional small sample size data. The performance of our method is illustrated using simulations and real data examples.

Autocorrelated kernel density estimation

When: Thu, November 9, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Chris Fleming (Biology, UMCP) -
Abstract: Non-parametric probability density function estimation is an important statistical problem in movement ecology, where researchers are interested in quantifying animal "space use" and delineating "home range" areas. The most common statistical approach to this problem is kernel density estimation, which traditionally assumes independently sampled data. Unfortunately, animal tracking data are invariably correlated in time (non-independent), as the continuity of movement dictates that each animal location is in close proximity to the next. Moreover, as GPS and battery technology improve, researchers are increasing their sampling rates commensurately, which increases the autocorrelation between sequential locations and further violates the assumption of independence. Here I describe the recent development of kernel density methods derived to accommodate autocorrelated data, which are currently being applied the field of movement ecology.

Efficiency Requires Innovation

When: Thu, November 30, 2017 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Abram Kagan (UMCP) -

No Calculation When Observation Can Be Made

When: Thu, February 8, 2018 - 3:30pm
Where: Kirwan Hall 1313
Speaker: Dr. Tommy Wright (U.S. Census Bureau) -
Abstract: For use in connection with the general and complete observations that would be known from a full census, Kiaer (1895, 1897) presents a purposive “Representative Method” for sampling from a finite population to provide “…more penetrating, more detailed, and more specialized surveys…” Many credit this method with laying seeds for current sampling methods used in producing official social and economic statistics. At a time when just about all official statistics were produced by censuses, Kiaer had much opposition, especially from statistician von Mayr, who said (a translation), “…no calculations when observations can be made.”

Neyman (1934) brought probability to this Representative Method using stratified random sampling. Probability makes it possible to express uncertainty about the results from the Representative Method and to say how good the results are. Neyman presents details for the well-known and widely used optimal allocation of the fixed sample size among the various strata to minimize sampling error. When sample sizes are rounded to integers from Neyman’s allocation, minimum sampling error is not guaranteed. Wright (2012) improves Neyman’s result with a simple derivation obtaining exact results that always yield integer sample size allocations while minimizing sampling error. Wright (2014, 2016, 2017) obtains exact integer optimal allocation results when there are mixed constraints on sample sizes for each stratum or when there are desired precision constraints. With exact optimal allocation, we demonstrate a decrease in needed sample size for the same precision using 2007 Economic Census data in the sample design for part of the subsequent Service Annual Survey.

We conclude by calling on the phrase “…no calculation when observation can be made” to muse about current world-wide considerations to make greater use of data from additional sources (e.g., administrative records, commercial data, big data…) to produce official statistics.