Jae-Kwang Kim, Iowa State University, Ames
Roderick Little, University of Michigan, Ann Arbor
Rebecca Andridge, Ohio State University, Columbus
Partha Lahiri, University of Maryland, College Park
Part 1. Information bias and its remedies
Part 2. Selection bias and its remedies
Information bias and selection biases are the main features of incomplete data. In the overview lecture, we cover these problems in the statistical framework and introduce statistical tools for handling incomplete data. Topics includes measurement error models, observed likelihood, denoising, mean score theorem, EM algorithm, and sample likelihood. The overview lecture does not assume strong background in statistics. Minimum knowledge in undergraduate mathematical statistics should be enough to understand the basic ideas in this overview lecture.
I present recent work concerning two key ideas in survey nonresponse, namely response propensity and missing at random. I propose a specific
definition of the response propensity that clarifies the conditioning, and weakened sufficient conditions for missing at random for asymptotic
frequentist maximum likelihood inference. Finally I show how an explicit modeling approach allows certain missing not at random mechanisms to be identified when there is post-stratification information.
Among the numerous explanations that have been offered for recent polling errors in U.S. pre-election surveys, selection bias due to non- gnorable partisan nonresponse bias, where the probability of responding to a poll is a function of the candidate preference that a poll is attempting to measure (even after conditioning on other relevant covariates used for weighting adjustments), has received relatively less focus in the academic literature. Under this type of selection mechanism, estimates of candidate preferences based on individual or aggregated polls may be subject to significant bias, even after standard weighting adjustments. Until recently, methods for measuring and adjusting for this type of non-ignorable nonresponse or selection bias have been unavailable. This talk describes a simple model-based index of the potential bias in estimates of population proportions (e.g., candidate preference) due to non-ignorable nonresponse/selection mechanisms. The index depends in an inestimable parameter that captures the amount of deviation from missingness at random; this parameter ranges from 0 to 1 and naturally lends itself tto a sensitivity analysis. We analyze publicly available data from seven different pre-election polls conducted in seven different \swing" states by ABC and the Washington Post in 2020, and evaluate the ability of these new measures to detect bias in estimates of the proportion of likely voters in each state that will vote for President Trump. Using official election outcomes in each state as benchmarks and alternative data sources for estimating key characteristics of the likely voter populations in each state, we evaluate the ability of the
new measure to 1) detect potential selection bias in these estimates, and 2) adjust for that bias when official pre-election polling estimates are produced.
There is a growing demand to produce reliable estimates of different characteristics of interest for small geographical areas (e.g.., states) or domains obtained by a cross-classification of different demographic factors such as age, sex, race/ethnicity. The information on the outcome variable(s) of interest often comes from a sample survey that targets reliable estimation for large areas (e.g., national level). In this talk, I will discuss how model-based imputation methods can be used to improve inferences about different small area or domain parameters. The proposed method essentially uses suitable statistical models that can be used to extract information from multiple data sources. We illustrate the proposed methodology in the context of election projection for small areas. The talk is based on collaborative research with UMD students Aditi Sen and Zhenyu Yue.
Missing data is frequently encountered in practice. Propensity score estimation is a popular tool for handling such missingness. The propensity score is often developed using the model for the response probability, which can be subject to model misspecification. In this talk, we consider
an alternative approach of estimating the inverse of the propensity scores using the density ratio function. By partitioning the sample into two groups based on the response status of the elements, we can apply the density ratio function estimation method and obtain the inverse propensity scores for nonresponse adjustment. Density ratio estimation can be obtained by applying the so-called maximum entropy method, which uses the Kullback-Leibler divergence measure under calibration constraints. By including the covariates for the outcome regression models only into the density ratio model, we can achieve efficient propensity score estimation. We further extend the proposed approach to the multivariate missing case. Some limited simulation studies are presented to compare with the existing methods.
Missing data is an important area of statistics. The workshop will feature well known experts who will give several lectures including an introduction and research work.