When clinical trial participants fail to attend client sites, drop out, or perhaps off-site data capture technologies fail, missing data in clinical trials can occur. This creates problems for drug developers as there become gaps in the trial analysis data expected to be received to help determine the efficacy and safety of the trial drug.
The approach to handling missing data in clinical trials has evolved over the past twenty years, particularly regarding methods for incorporating missing data to produce more comprehensive results. Issues surrounding missing data are of particular importance, due to the risks of introducing bias, reducing statistical power, creating inefficiencies and detecting false positives (Type I Error). As part of methodologies for missing data in clinical trials, researchers must consider how to handle missing data effectively from the outset.
The International Conference on Harmonisation (ICH) E9 guideline (1998) addresses the complexities of missing data and acknowledges that there is no ‘gold standard’ for managing missing data, given their unique study designs and varying measurement characteristics. The ICH guidelines suggest sensitivity analysis and missing data handling to be predefined in the protocol. Additionally, any reasons for study withdrawal will be recorded, to protect trial integrity when addressing missing data throughout the course of the clinical trial.
Due to the complexities of handling missing data the ICH Guidelines give additional support by defining the use of estimands and handling of intercurrent events in the R1 Addendum to E9 (2017). The addendum highlights that it is essential for missing data challenges and proposed methodologies to be understood and addressed according to the chosen estimand at the time of trial design.
Statisticians play a key role in trial design, conduct and overall analysis of studies, but also contribute to patient retention strategies to help prevent missing data in the trial. Additionally, using historical data through Bayesian study designs can help identify missing data patterns and suggest plausible approaches.
For repeated measures analyses, which are commonly used in clinical trials, many techniques can be defined at the time of writing the Statistical Analysis Plan (SAP) to outline how to handle missing data effectively. These methodologies vary based on study design and regulatory expectations, ensuring missing data does not compromise statistical validity.
Imputation techniques estimate missing values based on available data. Below are some widely used methods in clinical trials:
Complete Case Analysis is the simplest approach, where only subjects with no missing data are included in the final analysis. Although this method is easy to implement, it can lead to a high rate of subjects/data being excluded and fails to address the underlying reasons why a subject has missing data. For instance, in a quality of life (QoL) questionnaire where many subjects in one arm withdraw due to toxicity.
The Last Observation Carried Forward is often used in longitudinal studies. It replaces missing values with each participant’s most recent observed measurements, assuming the response remains unchanged thereafter. While LOCF is frequently employed for its simplicity, it has been criticised for providing an incomplete picture of safety and efficacy profiles, potentially leading to ambiguous final results.
Similar to LOCF, the Baseline Observation Carried Forward imputation method also applies to longitudinal data. Here, the baseline value is carried forward, assuming no change, positive or negative, after study entry. This conservative approach may underestimate true treatment effects and should be used with caution in efficacy and analyses.
The Worst Observation Carried Forward is another imputation method and again applies to longitudinal data. It is another conservative method, particularly in safety analyses as it replaces missing values with the participant’s worst recorded outcome, assuming any unobserved measurements reflect a ‘worst-case’ scenario. While it helps protect against overly optimistic results, WOCF can exaggerate negative outcomes and fail to reflect real patient experiences.
Single Mean Imputation replaces missing data with the overall mean of observed data that is available. Whilst this method attempts to balance between the 3 methods stated above (as this would include baseline, worst and last), is also has flaws in that it does not consider when these observed values were measured.
This method acknowledges the uncertainty of missing data by generating different possible values rather than relying on a single estimate. By incorporating variability across imputed datasets, it provides more robust statistical inferences and reduces the risk of bias, ultimately improving the reliability of study outcomes.
However, these methods (with the exception of MI), while useful in certain contexts, have limitations that can introduce bias into clinical trial analyses. In all three examples below, we look at scenarios where participants have answered a QoL questionnaire based on a scale of 0-6 (where 6 is the highest score). Results were collected for all visits from baseline except the final visit (visit 6).
Example 1
In example 1, we have a scenario where the results decrease at each visit (indicating a trend of QoL is decreasing). We see that BOCF carries forward the best result recorded, mean generates a result better than any seen in the previous 3 visits, whilst WOCF and LOCF both carry through the same score of 3.5. In this example, most methods likely overestimate the response, with BOCF carrying forward a result very unlikely to have been observed had the result been collected. This overestimation means we impute a response likely better than one we could realistically have expected to observe.
Example 2
In example 2, the reverse is observed – participants see an improvement at each visit. BOCF and WOCF both carry through the baseline result, the mean generates a result worse than any of the last 3 collected results, whilst LOCF carries forwards the last recorded result. All of these methods likely underestimate the response we This underestimation means we impute a response likely worse than one we could realistically have expected to observe had the result been collected, which whilst better than overestimation, is still not ideal.
Example 3
In example 3, we have a scenario where the participant has a range of results collected that increases and decreases from baseline. These scenarios show how these methods can be susceptible to generating results that we would be unlikely to observe had the reading been collected, and can lead to a wide range of different potential generated results.
As demonstrated in the above examples, simple imputation methods can introduce bias and fail to capture the inherent uncertainty. These approaches often assume missing values behave similarly to observed data, which is frequently unrealistic. As such, more sophisticated methods, such as Multiple Imputation and Mixed Models for Repeated Measures (MMRM), offer greater accuracy while preserving variability in the dataset.
As mentioned, Multiple imputation for missing data (MI), using Rubin’s Framework (1987) is a more complex yet robust statistical approach designed to reduce bias and account for variability in missing data. It creates multiple datasets with imputed values using predictive models, analyses each dataset separately using statistical methods, and then combines results to produce a final estimate and confidence interval.
Analyses based on multiple imputations should produce a result which reflects the true population estimate, while adjusting for the uncertainty of the missing data.
Rubin suggests the following method:
Standard Multiple Imputation performs the imputations such that the results for the patient with the missing data trends towards the mean for their associated treatment group, due to the weakening of the within subject correlation. This also results in an increase in variance as time progresses, as is expected in clinical trials.
The realisation that the subjects who withdraw are no longer on randomised treatment leads to developments to allow imputation based on a clinically plausible post-withdrawal path. One of these is Multiple Imputation under the Copy Reference (CR) assumption, where post-withdrawal data is modelled assuming that the subject was a member of the reference group. Here, the outcome would tend towards the mean for the reference group.
In a case study examined to investigate Multiple Imputation in clinical research, active and placebo treatments were compared (at Weeks 2, 4, 6 and 12 of the trial) in adolescents with acne. The primary endpoint was the number of lesions at Week 12. In this study, dropouts and withdrawals were common. Factors believed to affect the propensity to have missing data were identified as age, experiencing side effects, and experiencing a lack of efficacy. Through this information, it was understood that missing data patterns were likely to differ between the groups.
It is common for datasets of this type to be analysed using an Analysis of Covariance (ANCOVA) of the last observation carried forward data. MI methods can be programmed using PROC MI in SAS Version 9.3 offering an alternative method to deal with missing data.
Below, we explore the Multiple Imputation process, compare results with LOCF ANCOVA and Mixed Models Repeated Measures (MMRM) methodologies and ask: is Multiple Imputation worth the effort?
A simulation of 1000 data sets was carried out by removing data randomly from a completer dataset (N=131) using propensity scores based on the pattern of missing data observed in the full dataset (N=153). Least Squares (LS) means and differences were estimated, along with the Standard Error (SE). Boxplots present the bias and relative SE from Multiple Imputation compared to LOCF ANCOVA and a MMRM approach without imputation of data; these are relative to the ANCOVA on the completer dataset.
The least biased of several methods of MI tested was Predictive Mean Matching (PMM), which imputes values by sampling from k observed data points closest to a regression predicted value where the regression parameters are sampled from a posterior distribution. The total variance of combined ANCOVA results (see Figure 1) is calculated from the average within-imputation (W) and between-imputation variance (B). [1], [2]
Figure 1: Flow chart of Multiple Imputation Process
MMRM was identified as the least biased and LOCF the most biased of the three methods (Figure 2). Relative SEs were demonstrated to be greatest for PMM (Figure 3).
Figure 2: Bias in LS Means of Estimate
Figure 3: Relative standard error of difference in treatment means
Both figures show the distribution from 1000 simulations (data were removed randomly based on propensity scores; the propensity model included age, side effect of pain after treatment and efficacy measured by lesion counts). Bias and relative standard errors are relative to the completer dataset.
The Food and Drug Administration (FDA) were critical of the use of LOCF in Phase 3 clinical trials, as this method assumes no trend of response over time, resulting in bias and a distorted covariance structure. All methods in PROC MI and MMRM make the assumption that data are Missing at Random (MAR). However, PROC MI has useful functionality in summarising the missing data patterns.
When using Multiple Imputation, it can be complex to define a priori as there are many details to consider (see Figure 1) and additional data processing steps are necessary. The PMM method of imputation has the greatest advantage over alternative MI methods, in that no bounds, rounding or post-imputation manipulation is required to give plausible imputed lesion counts. Sensitivity analyses can investigate a range of delta (δ) values added to imputed values to explore the robustness of conclusions to imputation.
Relative SEs were generally greater than 1 for all methods, which is to be expected given the loss of approximately 15% of data from the completer dataset by using the propensity scores in the simulation of missing values. The SEs from MI techniques incorporate an additional component (B) to account for the uncertainty in the imputation, whereas LOCF ignores this uncertainty. However, the resulting SE from MI is appreciably larger than that from MMRM, and thus this Multiple Imputation method has less statistical power.
Managing missing data effectively is essential to ensure clinical trial validity. While traditional imputation methods such as LOCF and CCA have been widely used, advanced approaches such as Multiple Imputation and MMRM provide greater accuracy by preserving the natural variability in datasets.
Regulatory agencies continue to refine expectations regarding methodologies for missing data in clinical trials, urging researchers to adopt best practices. By implementing comprehensive missing data strategies, researchers can ensure that trial outcomes remain reliable and representative of real-world treatment effects.
Quanticate's statistical consultants are among the leaders in their respective areas enabling the client to have the ability to choose expertise from a range of consultants to match their needs. If you need biostatistical consultancy please Submit a RFI to speak to a member of our team.
References
[1] SAS/STAT(R) 12.1 User's Guide, "The MIANALYZE Procedure, Combining Inferences from Imputed Data Sets," [Online]. Available:
http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_mianalyze_details08.htm.
[2] D. Rubin, Multiple Imputation for Nonresponse in Surveys, New York: John Wiley & Sons, 1987.
Bring your drugs to market with fast and reliable access to experts from one of the world’s largest global biometric Clinical Research Organizations.
© 2025 Quanticate