Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study

14Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended under multiple imputation. Regression imputation uses a fitted model to impute the predicted value of missing predictors from observed data, and could offer a pragmatic alternative at deployment. Moreover, the use of missing indicators has been proposed to handle informative missingness, but it is currently unknown how well this method performs in the context of clinical prediction models.Methods: We simulated data under various missing data mechanisms to compare the predictive performance of clinical prediction models developed using both imputation methods. We consider deployment scenarios where missing data is permitted or prohibited, imputation models that use or omit the outcome, and clinical prediction models that include or omit missing indicators. We assume that the missingness mechanism remains constant across the model pipeline. We also apply the proposed strategies to critical care data.Results: With complete data available at deployment, our findings were in line with existing recommendations; that the outcome should be used to impute development data when using multiple imputation and omitted under regression imputation. When missingness is allowed at deployment, omitting the outcome from the imputation model at the development was preferred. Missing indicators improved model performance in many cases but can be harmful under outcome-dependent missingness.Conclusion: We provide evidence that commonly taught principles of handling missing data via multiple imputation may not apply to clinical prediction models, particularly when data can be missing at deployment. We observed comparable predictive performance under multiple imputation and regression imputation. The performance of the missing data handling method must be evaluated on a study-by-study basis, and the most appropriate strategy for handling missing data at development should consider whether missing data are allowed at deployment. Some guidance is provided.

References Powered by Scopus

Missing data: Our view of the state of the art

9118Citations
N/AReaders
Get full text

pROC: An open-source package for R and S+ to analyze and compare ROC curves

8727Citations
N/AReaders
Get full text

Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors

8084Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Developing clinical prediction models: a step-by-step guide

16Citations
N/AReaders
Get full text

Prioritising deteriorating patients using time-to-event analysis: prediction model development and internal–external validation

9Citations
N/AReaders
Get full text

Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy

5Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Sisk, R., Sperrin, M., Peek, N., van Smeden, M., & Martin, G. P. (2023). Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study. Statistical Methods in Medical Research, 32(8), 1461–1477. https://doi.org/10.1177/09622802231165001

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 12

67%

Researcher 4

22%

Professor / Associate Prof. 2

11%

Readers' Discipline

Tooltip

Medicine and Dentistry 6

50%

Mathematics 3

25%

Engineering 2

17%

Computer Science 1

8%

Save time finding and organizing research with Mendeley

Sign up for free