Multiple imputation for nonresponse when estimating hiv prevalence using survey data amos chinomona1,2 and henry mwambi2 abstract background. The importance of modeling the sampling design in multiple. Multiple imputation to correct for nonresponse bias. Application in non communicable disease risk factors survey. The goal was to facilitate valid inferences when the data producer and the. Missing data are often a problem in largescale surveys, arising when a sampled unit does not respond to the entire survey unit nonresponse or to a particular.
Brownstone and valletta 1996 show how multiple imputation techniques can be used to combine information from the validation and main surveys to estimate econometric models. A strong similarity between the mi and the followupestimates was found. Multiple imputation for nonresponse in surveys wiley series. Missing rates and multiple imputation cross validated. Cite email print share multiple imputation in the survey. Rubin d b 1987 multiple imputation for nonresponse in surveys new york ny wiley from hesc 220 at california state university, fullerton. Multiple imputation rubins 1987a multiple imputation methodology first requires a. The imputation of missing data is often a crucial step in the analysis of survey data. Multiple imputation for nonresponse in surveys 9780471655749. To provide the same complete data to all the analysts, you can impute the missing values by replacing them with reasonable nonmissing values.
Multiple imputation for nonresponse in surveys bibsonomy. Multiple imputation and its application, by james r. Based on this model, one or multiple values are imputed for every missing value on every variable for every case. In the case of unit nonresponse, we often have limited data on nonrespondents. Issues of nonresponse and imputation in the survey of income and program participation graham kalton university of michigan daniel kasprzyk department of health and human services robert santos university of michigan this paper describes the extent and nature of the household, person and itemlevel nonresponse that the u. We develop a method for constructing a monotone missing pattern that allows for imputation of. Multiple imputation was applied to both the patient and physician surveys, following similar schemes presented in detail in later sections. Multiple imputation in the survey of consumer finances arthur b. This means that the imputation model can be optimized in such a way that it strongly predicts both the dependent variable to be imputed, and the missingness process. A note on bayesian inference after multiple imputation. Multiple imputation, unit nonresponse, missing data, complex surveys.
The results of a national fear of crime survey are compared with results following the use of different nonresponse correction procedures. A twostage imputation procedure for item nonresponse in. Rubin d b 1987 multiple imputation for nonresponse in surveys. The validity of results from multiple imputation depends on such modelling being done carefully and appropriately.
About this book demonstrates how nonresponse in sample surveys and censuses can be handled by replacing each missing value with two or more multiple imputations. Rubin d b 1987 multiple imputation for nonresponse in. Multiple imputation for nonresponse in surveys by donald b. Mi is a statistical method for analyzing incomplete data. Determining sufficient number of imputations using variance of imputation variances. It has been used in, for example, the fatality analysis. Multiple imputation mi has become an extremely popular approach to handling missing data. How to obtain valid inference under unit nonresponse.
Using administrative records to impute for nonresponse e. The blue social bookmark and publication sharing system. Data from 2012 namcs physician workflow mail survey. Kennickell senior economist and project director scf mail stop 153 board of governors of the federal reserve system washington, dc 20551 voice. Simpler imputation methods as well as more advanced methods, such as fractional and multiple imputation, are considered. Multiple imputation for nonresponse in surveys wiley series in. It should be routinely considered for imputing missing data. We compared naive estimates, weighted estimates, estimates after a thorough nonresponse followup and estimates after multiple imputation. A popular approach for implementing multiple imputation is sequential regression modeling, also called multiple imputation by chained equations mice.
Multiple imputation for nonresponse in surveys book depository. Essentially, all surveys are likely to have some degree of nonresponse bias, but in many cases it occurs at a very small and thus a negligible i. Reference category for birth region is other country. Journal of the american statistical association 894. Cran task view multivariate has section missing data not quite comprehensive, annotated by mm mitools provides tools for multiple imputation, by thomas lumley r core, also author of survey mice provides multivariate imputation by chained equations.
Some units do not respond at all, and others respond only to certain items. Multiple imputation for missing data in epidemiological. Vim provides methods for the visualisation as well as imputation of. Instead of filling in a single value for each missing value, rubins 1987 multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. However, the usual advice for multiple imputation for modest fractions of. Accuracy of five multiple imputation methods in estimating. However, often the criteria for using a method depend on the scale of the data, which in official statistics are typically a mixture of continuous, semicontinuous, binary, categorical and count variables.
Buy multiple imputation for nonresponse in surveys wiley classics library subsequent by rubin, donald b. After the imputation process, they are often treated like originally observed values, leading to an underestimation of the variance in the data and from this to p values that are too significant. This paper focuses on imputation in the patient surveys. Summary of multiple imputation retains advantages of single imputation consistent analyses data collectors knowledge rectangular data sets corrects disadvantages of single imputation reflects uncertainty in imputed values corrects inefficiency from imputing draws estimates have high efficiency for modest m, e. Multiple imputation for nonresponse when estimating hiv. This issue is similar to bias that can result from unit nonresponse to surveys. The goal was to facilitate valid inferences when the data producer and the ultimately many end users of the data were distinct entities. Adjusting for nonresponse in the analysis stage might lead different analysts to use different, and inconsistent, adjustment methods. A distinction between iterative modelbased methods, knearest neighbor methods and miscellaneous methods is made. Multiple imputation for missing data via sequential. Leone university of texas at austin regardless of the overall response rate, surveys that involve individual self.
Demonstrates how nonresponse in sample surveys and censuses can be handled by replacing each missing value with two or more multiple imputations. Multiple imputation for nonresponse in surveys wiley classics. Owing to the perceived sensitivity of this topic to some people, unit and item nonresponse rates in the scf are substantial. By stef van buuren, it is also the basis of his book. Multiple imputation of family income and personal earnings in the national health interview survey.
Instead of filling in a single value for each missing value, rubins 1987 multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to. The survey of consumer finances scf focuses intensely on the details of households finances. Jun 09, 2004 demonstrates how nonresponse in sample surveys and censuses can be handled by replacing each missing value with two or more multiple imputations. The survey of consumer finances scf focuses intensely on the details of households. Commonly used multiple imputation methods work well for up to 3040 variables from sample surveys and other data with similar rectangular, nonhierarchical properties, such as from surveys in. Fractional imputation fi is a relatively new method of imputation for handling item nonresponse in survey sampling.
Multiple imputation of family income and personal earnings. The crucial difference is that weighting uses the same variables for correcting the entire dataset, whereas imputation models differ for every variable that is. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. High nonresponse rates are of theoretical and practical importance, because of the need to justify the high survey costs of random samples compared with convenience.
In this article, we examine the approximation of gelman et al. Multiple imputation background most large scale surveys are subject to some nonresponse. Multiple imputation for nonresponse in surveys can serve as the basis for a course on survey methodology at the graduate level in a department of statistics, as i have done with earlier drafts. The use of sample weights in hot deck imputation ncbi. Survey of income and program participation sipp is likely to. Inferences for twostage multiple imputation for nonresponse. Multiple imputation in a largescale complex survey. Multiple imputation provides a useful strategy for dealing with data sets with missing values. May 26, 2004 buy multiple imputation for nonresponse in surveys wiley classics library subsequent by rubin, donald b. Stata bookstore multipleimputation reference manual. Multiple imputation for nonresponse in surveys donald b. We selected a complete subsample of steps survey data set and devised.
Multiple imputation of family income and personal earnings in. Multiple imputation was suggested by rubin 1978 to overcome these problems. A ndy p eytchev is a survey methodologist at rti international, research triangle park, nc, usa, and an instructor at the odum institute, university of north carolina at chapel hill, chapel hill, nc, usa. Rubin, db multiple imputation for nonresponse in surveys1987new yorkwiley. Imputation methods for handling item nonresponse in the. The authors illustrate the application of this technique using data from the homenet project. The flexibility of the mi procedure has prompted its use in a wide variety of applications. Rubin, 9780471655749, available at book depository with free delivery worldwide. The size of the bias is a function of a the magnitude of the difference between respondents and nonrespondents and b the proportion of all sampled elements that are. Multiple imputation methodology for missing data, non. Multiple imputation for nonresponse in surveys can serve as the basis for a course on survey methodology at the graduate level in a department of statistics, as i have done with earlier drafts at the university of chcago and harvard university. Also presents the background for bayesian and frequentist theory.
This study was carried out to use multiple imputation mi in order to correct for the potential nonresponse bias in measurements related to variable fasting blood glucose fbs in noncommunicable disease risk factors survey conducted in iran in 2007. Multiple imputation for unitnonresponse versus weighting. Wiley series in probability and mathematical statistics. However, the multiple imputation procedure requires the user to model the distribution of each variable with missing values, in terms of the observed data. The paper introduces the reader new to the imputation literature to key ideas and methods. Multiple imputation has potential to improve the validity of medical research. Multiple imputation for nonresponse in surveys wiley.
Missing data are a common feature in many areas of research especially those involving survey data in biological, health and social sciences research. Although some experts advised that unless nonresponse rate is not unusually high more than five to ten imputation has no additional gain in efficiency rubin, 1987. The basic idea is to impute missing values in y 1 from a regression of the observed elements of y 1 on y 2, y 3, etc. Twotailed statistical significance tests were used. Considering a panel of brdis companies throughout the years 2008 to 20 linked to lbd data. Multiple imputation for nonresponse in surveys, by rubin, 1987, 287 pages. Everyday low prices and free delivery on eligible orders. Most of the analyses of the survey data are done taking a completecase approach, that is taking a listwise deletion of all cases with missing values assuming that missing values are missing completely at random mcar. Typically in large surveys, less than 100% of the sampled units respond fully to the survey. Multiple imputation, unitnonresponse, missing data, complex surveys. Buy multiple imputation for nonresponse in surveys wiley classics library subsequent by donald b. However, there are a large number of issues and choices to be considered when applying it. Multiple imputation in the survey of consumer finances. These surveys obtain information from participants regarding their cancer diagnosis and treatment, quality of life, experiences of care, care.
181 1605 801 217 518 1228 22 1321 638 377 248 215 1589 1493 1225 1364 1372 1317 157 921 1247 1293 1283 1130 1380 835 1470 525 596 846 660 355 433