C.2 Multiple imputation

Package mice implements a method called: Multivariate Imputation by Chained Equations. The main function mice() will try to select an appropriate method based on the type of variable (discrete, continuous, etc.). In general, the advantage of using mice() with discrete data is that it has a number of methods that will actually return discrete values.

Check the manual page for mice (e.g. type ?mice in the console), to see the 25 methods that are available. On that manual page you can also find links to a number of vignettes that provide a very thorough explanation of all the functions the package has to offer.

In this vignette, we will focus on a simple demonstration of just a few of the methods in mice().

Auto-select method

We can just provide the mice() function our data set and it will take care of analysing the variables and selecting an appropriate imputation method.

# auto choice by mice algorithm
#imp.mice <- mice::mice(df_vars)

The algorithm chooses methods pmm, polyreg and polr:


By default mice() will generate 5 iterations of each time series, that is, argument maxit = 5. If you inspect the imp.mice object you can see it is a list with several fields, the field imp is another list with fields named after the columns in our data set. Each field contains 5 iterations for the variable.


To generate replacements for the missing values from those 5 iterations we need to call the function complete().

out.auto <- mice::complete(imp.mice)

Check the complete() manual entry for some other interesting options.

Classification & regression trees

We also choose an imputation method for all variables, one based on classification and regression trees (cart), it will give the same results as the method based on random forest imputation (rf).

# RF and CART return (identical) discrete numbers
imp.cart  <- mice(df_vars, meth = 'cart', printFlag = FALSE)
out.cart  <- complete(imp.cart)

# imp.rf  <- mice(df_vars, meth = 'rf')
# out.rf  <- complete(imp.cart)