# Missing Data: Titles and Abstracts

### Overview Lecture for Missing Data

Jae-Kwang Kim

LAS Dean's professor (2020-2022)

Iowa State University, Ames, IA

Part 1. Information bias and its remedies

Part 2. Selection bias and its remedies

Information bias and selection biases are the main features of incomplete data. In the overview lecture, we cover these problems in the statistical framework and introduce statistical tools for handling incomplete data. Topics includes measurement error models, observed likelihood, denoising, mean score theorem, EM algorithm, and sample likelihood. The overview lecture does not assume strong background in statistics. Minimum knowledge in undergraduate mathematical statistics should be enough to understand the basic ideas in this overview lecture.

### Two key ideas in survey nonresponse: response propensity and missing

at random

Roderick Little, University of Michigan, Ann Arbor

Richard D. Remington Distinguished University Professor of Biostatistics

I present recent work concerning two key ideas in survey nonresponse, namely response propensity and missing at random. I propose a specific

definition of the response propensity that clarifies the conditioning, and weakened sufficient conditions for missing at random for asymptotic

frequentist maximum likelihood inference. Finally I show how an explicit modeling approach allows certain missing not at random mechanisms to be identified when there is post-stratification information.

### Applying Non-Ignorable Missing Data to U.S. Election Polling Data

Rebecca Andridge

Associate Professor, Biostatistics

The Ohio State University

Among the numerous explanations that have been offered for recent polling errors in U.S. pre-election surveys, selection bias due to non- gnorable partisan nonresponse bias, where the probability of responding to a poll is a function of the candidate preference that a poll is attempting to measure (even after conditioning on other relevant covariates used for weighting adjustments), has received relatively less focus in the academic literature. Under this type of selection mechanism, estimates of candidate preferences based on individual or aggregated polls may be subject to significant bias, even after standard weighting adjustments. Until recently, methods for measuring and adjusting for this type of non-ignorable nonresponse or selection bias have been unavailable. This talk describes a simple model-based index of the potential bias in estimates of population proportions (e.g., candidate preference) due to non-ignorable nonresponse/selection mechanisms. The index depends in an inestimable parameter that captures the amount of deviation from missingness at random; this parameter ranges from 0 to 1 and naturally lends itself tto a sensitivity analysis. We analyze publicly available data from seven different pre-election polls conducted in seven different \swing" states by ABC and the Washington Post in 2020, and evaluate the ability of these new measures to detect bias in estimates of the proportion of likely voters in each state that will vote for President Trump. Using official election outcomes in each state as benchmarks and alternative data sources for estimating key characteristics of the likely voter populations in each state, we evaluate the ability of the

new measure to 1) detect potential selection bias in these estimates, and 2) adjust for that bias when official pre-election polling estimates are produced.

### Model-based Imputation Methods For Small Area Estimation

Partha Lahiri

Director, Joint Program in Survey Methodology (JPSM),

Professor, JPSM and Department of Mathematics,

University of Maryland, College Park

There is a growing demand to produce reliable estimates of different characteristics of interest for small geographical areas (e.g.., states) or domains obtained by a cross-classification of different demographic factors such as age, sex, race/ethnicity. The information on the outcome variable(s) of interest often comes from a sample survey that targets reliable estimation for large areas (e.g., national level). In this talk, I will discuss how model-based imputation methods can be used to improve inferences about different small area or domain parameters. The proposed method essentially uses suitable statistical models that can be used to extract information from multiple data sources. We illustrate the proposed methodology in the context of election projection for small areas. The talk is based on collaborative research with UMD students Aditi Sen and Zhenyu Yue.

### Information Projection Approach to Propensity Estimation for Handling Missing Data

Jae-Kwang Kim

LAS Dean's professor (2020-2022)

Iowa State University, Ames, IA

Missing data is frequently encountered in practice. Propensity score estimation is a popular tool for handling such missingness. The propensity score is often developed using the model for the response probability, which can be subject to model misspecification. In this talk, we consider

an alternative approach of estimating the inverse of the propensity scores using the density ratio function. By partitioning the sample into two groups based on the response status of the elements, we can apply the density ratio function estimation method and obtain the inverse propensity scores for nonresponse adjustment. Density ratio estimation can be obtained by applying the so-called maximum entropy method, which uses the Kullback-Leibler divergence measure under calibration constraints. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to the multivariate missing case. Some limited simulation studies are presented to compare with the existing methods.