Multiple imputation how many data sets
However, over time more and more examples occurred where that proved to be problematic, so now more than 5 is often advised. Sometimes I have heard the number 20, but the real answer is that it depends on the exact problem: How many observations do you have, how many missing values do you want to impute, what is the exact pattern in the missingness, how complicated is the imputation model, how complicated is the substantive model, how complicated are the coefficients you want to interpret from that model, etc, etc.
The more complicated the problem, the more imputations you need. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. How many multiple imputation datasets should we make? Ask Question. Asked 5 years, 8 months ago.
Active 5 years, 8 months ago. Viewed times. How many imputed datasets should we make? Subscribe to RSS Feed. Course Materials Please fill out the form below to download sample course materials. This field is for validation purposes and should be left unchanged.
How many imputations do you need? October 30, By Paul von Hippel When using multiple imputation, you may wonder how many imputations you need. So you often need more imputations to get replicable SE estimates. But how many more? Read on. A New Formula I recently published a new formula von Hippel that estimates how many imputations M you need for replicable SE estimates. FMI is the fraction of missing information.
The FMI is not the fraction of values that are missing; it is the fraction by which the squared SE would shrink if the data were complete. Standard MI software gives you an estimate. For that reason, I recommend a two-step recipe von Hippel, : First, carry out a pilot analysis.
Impute the data using a convenient number of imputations. Estimate the FMI by analyzing the imputed data. If you need more imputations than you had in the pilot, then add those imputations and analyze the data again. Software The two-step recipe has been implemented in three popular data analysis packages. When there are multiple parameters, it uses the highest FMI. An earlier, shorter version of this post appeared on missingdata.
See also Bodner, T. Rubin, D. Multiple imputation for nonresponse in surveys. New York: Wiley. The rule has now become the de-facto standard, especially in medical applications.
One potential difficulty might be that the percentage of complete cases is sensitive to the number of variables in the data. If we extend the active dataset by adding more variables, then the percentage of complete cases can only drop. An alternative would be to use the average missing data rate as a less conservative estimate.
Imputing a dataset in practice often involves trial and error to adapt and refine the imputation model.
0コメント