This blog explores the statistical methods used in Risk Based Monitoring (RBM) and how the result of such statistical methods enables improved data integrity across a clinical trial.
RBM and Centralised Monitoring (CM) are both on the rise within clinical trials. Trial time and cost are being reduced and onsite monitoring is becoming more efficient via a risk based approach. This is as a result of the ICH GCP E6 addendum R2.
One of our previous blogs outlines the need for statisticians and programmers in RBM. Here we examine some of the proposed techniques for RBM and CM in more detail and learn how statisticians and programmers can be used to increase trial efficiency and improve onsite monitoring.
One of the goals of RBM and CM is to centrally analyse the trial data on an ongoing basis to review trends. Formal statistical tests can be used to detect anomalies or similarities in the data which in turn can suggest where and how onsite monitors should focus their time and energy as they can determine whether to go to site based on the findings. Issues that could be identified include recording errors, staff requiring further training, potential fraud and danger to subjects. These could be at participant, investigator, site or geographic region level.
We are obviously faced with a great deal of data in a clinical trial whether it be regarding case report form (CRF) completion, demographics, adverse events (AEs), laboratory datasets, ECG, or vital signs (VS) and these can all require different kinds of analyses. Within this blog we concentrate on the laboratory data and the different kind of tests and analyses we might run to detect any issues using industry leading technology for statistical monitoring. This process of reviewing subject and site data using industry leading technologies we call Data Quality Oversight.
The vast amount of lab data involved in a trial can be further broken down into categories such as haematology, biochemistry and urinalysis, each area potentially requires separate analyses. So how do we detect if any data is, in fact, erroneous? There are a number of paths we could go down.
The online platform looks for both univariate and multivariate outliers.
With non-normal data, the Inter Quartile Range (IQR) can be used to determine individuals who are outliers. Using a chosen constant, k, typically k=2 or 3, individuals who are outside the ±k*IQR boundary can be flagged as outliers. k may vary from trial to trial, or even as the trial progresses as we learn more. For normal or normal transformed data, a similar approach can be used but with the Standard Deviation (SD) instead of IQR. Alternatively the Grubbs’ test for outliers can be used. The outliers can then be presented in tables and graphs for interpretation and advice to onsite monitors.
It may be the case that an individual’s data is erroneous when taking multiple variables into account. In these cases the online platform detects outliers by way of two methods.
The Euclidean distance (ED), Di, from the mean, for each individual, i, can be calculated using the mean, x̄j, and SD, Sj at each site, j by way of
Assuming a multivariate normal distribution across the variables, we can make use of the correlation structure of the data with the Mahalanobis distance (MD). In this case, Di is calculated by way of
where x̄ is the vector of means for the variables, S is the covariance matrix, and xi is the vector of observations for the ith person.
For both methods, D can be visualized by way of a scatterplot of Di against subject number, grouping or colouring by site and drawing a cut-off at k*SD, where k is again a chosen constant and SD is the standard deviation of the values of D. We can also produce boxplots of D by site to see if any stand out as being differently distributed. This will enable us to see which individuals are outliers and whether there appears to be clustering of these outliers within sites, or some other demographic.
The MD has an advantage over the ED in that D2 follows a chi-squared distribution with degrees of freedom equal to the number of variables. This allows for further checks, eg. those with large Di values can be identified as outliers if they exceed a critical value, according to the chi-squared distribution at a specified significance level. We can also plot theoretical chi-squared values against actual values to observe any unusual distributions.
We may be interested in inliers, too – those individuals for whom the results are too close to the mean. For these individuals, Di is closest to zero and a good way of detecting these by eye is by taking the negative of the log of Di and finding the highest values. We can then set a cut-off by way of mean(-log(Di)) + k*SD(-log(Di)), again specifying a value of k.
We have seen a snapshot of some of the statistical methods Quanticate are using to perform Data Quality Oversight for just one set of variables, focusing on outliers in a risk based approach. Other methods exist which have been outlined in the, some of which are outlined in RBM; the need for Programmers and Statisticians. The application of other methods such as Chernoff faces and rounding and correlation checks is still being explored. As the prevalence of RBM and CM increases, the methods used will be refined and amended. Any issues can then be identified with the best possible analyses and tests and communicated to onsite monitors with the ultimate aim of improving clinical trial efficiency.
Quanticate offers Data Quality Oversight (DQO) of site data which uses statistical analytics to generate reports to improve data integrity as outliers and data anomalies are discovered. DQO can be applied to any clinical monitoring method, submit a RFI if you would like to hear how we could improve data quality and a member of our Business Development team will be in touch with you shortly.
Related Blogs:
Bring your drugs to market with fast and reliable access to experts from one of the world’s largest global biometric Clinical Research Organizations.
© 2025 Quanticate