Schedule

  • 9:00 AM - Information Bias and its Remedies - Jae-Kwang Kim (Iowa State University)
  • 10:00 AM - Selection Bias and its Remedies - Jae-Kwang Kim (Iowa State University)
  • 11:00 AM - Two Key Ideas in Survey Nonresponse: Response Propsensity and Missing at Random - Roderick Little (University of Michigan)
  • 12 Noon -  Lunch
  • 1:00 PM - Applying Non-Ignorable Missing Data Methods - Rebecca Andridge (Ohio State University)
  • 2:00 PM - Model Based Imputation Methods for Small Area Estimation - Partha Lahiri (University of Maryland)
  • 3:00 PM Information Projection Approach to Propensity Estimation for Handling Missing Data - Jae-Kwang Kim (Iowa State University)

Speakers

Jae-Kwang Kim, Iowa State University, Ames
Roderick Little, University of Michigan, Ann Arbor
Rebecca Andridge, Ohio State University, Columbus
Partha Lahiri, University of Maryland, College Park

Missing Data: Titles and Abstracts

Overview Lecture for Missing Data

Jae-Kwang Kim
LAS Dean's professor (2020-2022)
Iowa State University, Ames, IA

Part 1. Information bias and its remedies
Part 2. Selection bias and its remedies
Information bias and selection biases are the main features of incomplete data. In the overview lecture, we cover these problems in the statistical framework and introduce statistical tools for handling incomplete data. Topics includes measurement error models, observed likelihood, denoising, mean score theorem, EM algorithm, and sample likelihood. The overview lecture does not assume strong background in statistics. Minimum knowledge in undergraduate mathematical statistics should be enough to understand the basic ideas in this overview lecture.

Two key ideas in survey nonresponse: response propensity and missing
at random

Roderick Little, University of Michigan, Ann Arbor
Richard D. Remington Distinguished University Professor of Biostatistics

I present recent work concerning two key ideas in survey nonresponse, namely response propensity and missing at random. I propose a specific
definition of the response propensity that clarifies the conditioning, and weakened sufficient conditions for missing at random for asymptotic
frequentist maximum likelihood inference. Finally I show how an explicit modeling approach allows certain missing not at random mechanisms to be identified when there is post-stratification information.

Applying Non-Ignorable Missing Data to U.S. Election Polling Data

Rebecca Andridge
Associate Professor, Biostatistics
The Ohio State University

Among the numerous explanations that have been offered for recent polling errors in U.S. pre-election surveys, selection bias due to non- gnorable partisan nonresponse bias, where the probability of responding to a poll is a function of the candidate preference that a poll is  attempting to measure (even after conditioning on other relevant covariates used for weighting adjustments), has received relatively less focus  in the academic literature. Under this type of selection mechanism, estimates of candidate preferences based on individual or aggregated polls may be subject to significant bias, even after standard weighting adjustments. Until recently, methods for measuring and adjusting for this type of non-ignorable nonresponse or selection bias have been unavailable. This talk describes a simple model-based index of the potential bias in  estimates of population proportions (e.g., candidate preference) due to non-ignorable nonresponse/selection mechanisms. The index depends  in an inestimable parameter that captures the amount of deviation from missingness at random; this parameter ranges from 0 to 1 and naturally  lends itself tto a sensitivity analysis. We analyze publicly available data from seven different pre-election polls conducted in seven different \swing" states by ABC and the Washington Post in 2020, and evaluate the ability of these new measures to detect bias in estimates of  the proportion of likely voters in each state that will vote for President Trump. Using official election outcomes in each state as benchmarks and alternative data sources for estimating key characteristics of the likely voter populations in each state, we evaluate the ability of the
new measure to 1) detect potential selection bias in these estimates, and 2) adjust for that bias when official pre-election polling estimates are produced.

Model-based Imputation Methods For Small Area Estimation

Partha Lahiri
Director, Joint Program in Survey Methodology (JPSM),
Professor, JPSM and Department of Mathematics,
University of Maryland, College Park

There is a growing demand to produce reliable estimates of different characteristics of interest for small geographical areas (e.g.., states) or domains obtained by a cross-classification of different demographic factors such as age, sex, race/ethnicity. The information on the outcome variable(s) of interest often comes from a sample survey that targets reliable estimation for large areas (e.g., national level). In this talk, I will discuss how model-based imputation methods can be used to improve inferences about different small area or domain parameters. The proposed method essentially uses suitable statistical models that can be used to extract information from multiple data sources. We illustrate the proposed methodology in the context of election projection for small areas. The talk is based on collaborative research with UMD students Aditi Sen and Zhenyu Yue.

Information Projection Approach to Propensity Estimation for Handling Missing Data

Jae-Kwang Kim
LAS Dean's professor (2020-2022)
Iowa State University, Ames, IA

Missing data is frequently encountered in practice. Propensity score estimation is a popular tool for handling such missingness. The propensity score is often developed using the model for the response probability, which can be subject to model misspecification. In this talk, we consider
an alternative approach of estimating the inverse of the propensity scores using the density ratio function. By partitioning the sample into two  groups based on the response status of the elements, we can apply the density ratio function estimation method and obtain the inverse propensity scores for nonresponse adjustment. Density ratio estimation can be obtained by applying the so-called maximum entropy method,  which uses the Kullback-Leibler divergence measure under calibration constraints. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to the  multivariate missing case. Some limited simulation studies are presented to compare with the existing methods. 

About the Workshop

Missing data is an important area of statistics. The workshop will feature well known experts who will give several lectures including an introduction and research work.