Efficient Data Reviews and Quality in Clinical Trials [Video]

Written by Statistical Consultancy Team | Fri, Jun 13, 2014

This video is presented by Kelci Miclaus from SAS JMP who was a speaker at Clinical Data Live 2013. Her presentation was is titled: 'Efficient Data Reviews and Quality in Clinical Trials'.

Video Transcript

Thank you, thanks for having me. And thank you, Eric, for laying a great foundation for the topic of my discussion, which is, as he mentioned, efficient data reviews both from the medical monitor perspective as well as interrogating the data quality and potential fraud detection in data management and clinical trials.

I guess an overarching theme of my presentation similar to what Eric mentioned is efficiently managing risk, so risk management and clinical trials and that's multifaceted so we have different areas of where we can actually improve the process and quality of our clinical trial to have more efficient reviews and more efficient trials carried out.

And part of that risk management looking forward is using the results, and using the analysis that we can now take advantage of the electronic data capture and all of the new and updated transformations in terms of how our data is collected, and use that to improve our clinical trials and the efficiency of our clinical trials in the future.

So my talk will kind of focus in two main areas of how to improve efficiencies with clinical trial review, part of that will be focused on the medical monitor safety analysis review. What kind of tools and what kinds of analyses and efficiencies can we gain? Again using this idea of centralized data monitoring to ensure patient safety and conduct of the trial as it progresses and a key part of this is the idea of ongoing trials.

Trials continue to become more complex. We continue to get more and more data and this is a continuous process. It's not like you review your data once, but you have to review it multiple times. Early and often in the trial, and you want to do that in the most efficient manner that you can. So part of my talk will be focusing on the issues of efficient data review, specifically, again, data collection and how time consuming that is, and what can we do to mitigate the risks with collecting the data as well as having time to analyze it.

On the other side as compared to doing the monitoring from a medical standpoint, we really want to continuously ensure that the data quality is upholding the protocols of our trials, and we have the integrity of the data that we would expect from something that is so important to our human protection.

So some of the topics I will touch for that section will be really focusing on taking advantage and utilizing electronic data capture and onsite clinical data monitoring, and like Eric had mentioned, one of the things right now that's so common and has been kind of a standard, is to do a 100% sort of state of verification.

Which is extremely time consuming and more and more as more recommendations from federal agencies come out, it's unnecessary and in fact can be less efficient and less effective than, taking a statistical risk based approach. So, as we all know, despite the issues that many of our clinical trials have, a randomized clinical trial is still a gold standard, but one of the problems with that is that it's a gold standard specifically for interrogating efficacy of a trial, and one of the things and the complications we have especially for those statisticians in the room, is how do we evaluate safety for these trials that have been designed to actually interrogate efficacy. So with safety analysis, which needs to be our first stop to ensure the safety of our patients, before we can worry about whether the drug is working, there is several difficulties to this: Namely that there is an enormous amount of end points that you're looking at repeatedly across time.

Adding to that is the problem of rare events. Detection of drug-induced liver injury. This is a very rare occurrence. One in ten thousand patients How can we detect that with our limited population for studying our safety measures during a medical monitor review. And I'll talk also a little but about just some of the other limitations, in terms of our understanding of the mechanisms in biological pathways.

Our focus here is where we're at, but I want to continue to think of where we can go with it. And what we can do with these risk based approaches to improve the trials in the future. All right. So, one of the things we can do, just immediately, to accelerate our reviews, is focusing instead of on static listings and tables and an enormous amount of paper work.

Is drive towards the standard of using dynamic visualizations to either start as a summary view and then generate our tables and reports from that, or use that as a tool ongoing throughout our medical monitor review. Another thing that will increase the efficiency of our review from a medical monitor standpoint, is using the standards that have been put it place.

Namely the CDISC data standards. By having the standardization of quality for our tables and all of our standard for our analysis, we can quickly surface tools that have capabilities to generate reports and reviews that anybody can use: clinicians, data managers, data monitors all from a standardized source.

And again, statistics cannot be ignored even in the initial stages of clinical trials where you do statistically driven analyses that provide that aggregate view, but allow you to drill down to the patient level to understand all the data, at either patient level, or at a trial level. So, just looking back a little bit.

It's been the standard I see from ICH E3 guidelines that the FDA released. Presenting all of our data in standard static table has been, the main practice that we've always gone from. But there are several problems with this. Even though it gives all the information, and it may be the easiest way to disseminate that information.

It's really hard to actually read these tables, and it's really easy to miss signals because of them. So instead of looking at these static tables, we need to take advantage of again this electronic data capture environment that we live in, and start with views such as a dynamic visual summary view.

That allows you to quickly highlight the issues of potential, in this case, adverse events that are occurring. So here, quickly, from a graph you can see that there's enormous count for this adverse event. And then tools that allow you to drill down and filter, lets you get to that level of patient detail that you need, starting form a summary view.

Now adding on more statistical considerations to that, it isn't just always a count and percent game. Applying statistical analysis early and often in a clinical trial can really drive the analysis. Both in terms of just evaluating safety, but also, again, detecting trends or anomalies in your data.

So a space constrained view that we found be a good recommendation for both our customers and that the FDA has begun looking at, is this idea of looking at a summary statistical view that's performing a statistical analysis of comparing the incidence of certain adverse events versus the treatment differences.

So our X-axis here, our X-axis here is a clinical difference in treatment in terms of the accounts of adverse events, which we're comparing with using statistics to our Y-axis which represents the P value or the significance, the statistical significance of that event. Presenting the space constrained view, again lets you really drive in on and hone in on the important things that are occurring during your safety analysis.

And from that view you can then drill down into these standard relative risk tables. From our collaborations with the FDA we often hear, oh well we want a table of the top 30 adverse events and their relative risk and confidence intervals. Well, what is that top 30, why is that a threshold?

They might look at that table and say actually I want the top 50 or I only need the top 10. So instead of providing that static view if it's driven from a statistical space constraint results view, we can get to the information that they truly want for the reviews much faster. So as the safety analysis continues there are several more advanced statistical analysis complexities that we need to address.

Again one of the things is, there's a lot of data, and big data, big noise. So how do we deal with the noise in order to pick out the truly interesting signals in our data? Part of that could be applying multiple testing methods, such as false discovery rate or even some other ones that have been developed more recently that adhere to the known structure of, say, adverse events where you have body system level information, so you can apply a multi-level, double-FDR correction. Other customers have been requesting methods to do Bayesian analysis.

So take the prior information that we know, and build that into our model to understand the adverse events that are effecting our trial, and that's really key too, because every trial is different, and every trial has a different expected safety profile, and that should be taken into consideration during your review.

Some of the other complexities that you'll hit is just the recurrent events, and the repeated measurements across time, so the inclusion of time windows in our analysis can help again accelerate our review and understand what's going on from period to period during our review. Other more complex issues could involve crossovers.

So how do we visualize crossover? Simple views can allow us to do that in simple inclusions of treatment period to understand the subject was actually on, can really add to the value of understanding the safety profile. And I will just give a plug here again, where we are right now we're already providing these tools.

Several different companies can provide these tools, and pharma is taking advantage of it. The FDA is taking advantage of it. But where are we going? A big part of this is we need to learn from what we're currently starting to do, and build from it and use that to build predicted models to improve our efficiency and improve our trials in the future.

So part of that is actually taking this data not only to evaluate the current trial, but build in predict models to understand how we can improve our trials in the future. As I've mentioned throughout this, this is an ongoing safety review, so the key component of ongoing is understanding the changes across snapshots of your data.

Again the time consumption of just data collection requires that data review happen early and often before all the data is collected. And we don't want to have a redundant work effort so having a tool that allows you to do snapshot comparison, meaning that as you get new data, you update your system and you can quickly see what's been added, what's new, what's updated, stable record, etcetera.

Part of that is allowing for an infrastructure that again allows for multiple views of the data. So data manager, data monitors, they might all be looking at the same data across time, so allowing for a notes infrastructure so you can make notes and track not only what has been updated, but track what's been reviewed and comments that you've made for that review can help as well.

So understanding the review status of each clinical trial and who's been looking at that data leads again to efficiencies in understanding the safety signals. Continuing on with that, flagging, as we've talked about, again, flagging the data to quickly look at what's important, what's been updated, coloring, annotation of your data, that's critical for accelerated review as well.

And at any time of course it is important to track that updated data and have that option to go back again to that summary level, or the patient level of all the data for that patient, so again, that's our mantra is you start at a summary level, you understand the trends, you look at the anomalies, you see updated data, and you want to see exactly how that effects the entire patient review.

So here we can have a patient profile that displays all of our safety metrics that have been collected, and then you can have options and widgets to only show what's been updated from the last review that you performed for this subject. And these filters and these flags that you create from the snapshot management review should be used throughout, again that's the beauty of electronic data capture. You have all the data there, you know what's been updated let's go back to our safety review and we can quickly see and annotate our view based on only the metric that have been added since the last period of review.

And tools like this really allow you to start comparing the data and the distributions of that data ongoing throughout our review. So across different periods of snapshot collection we can see what's changed. And part of that, as a lead into the separate component of my talk, which is data quality is understanding, okay, is that change unusual?

Are we starting to observe an unusual trend or anomalies in our data from the last period of collection, and leading into that is methods that we want to query and wonder, okay, we see some anomalies, is it just random error? Mistakes in the electronic data capture? Or is there potentially intentional fabrication of the data and this is rare, but that does not mean that it should not be addressed and one of things with this again, going back to the current status that we're in is, we've evolved to the state where because of centralized data monitoring.

These kind of methods to detect unusual data anomalies are much more readily at hand, and it was a very good timing, just in august the FDA officially released their guidance speaking on centralized data monitoring and risk base monitoring and they include the fact that by using this, you can actually much more readily detect anomalies, including fraud or potential fabrication of data with these techniques.

And I think this guidance and those that are also given by other regulatory agencies really, really, good timing. Despite the fact that everybody is talking about risk-based monitoring, fraud detection, electronic data capture, it's not yet that well adopted, so we want to continue to drive those standards and drive those recommendations.

A current survey has actually shown that only about 33% or less of pharma are using the centralized data monitoring techniques. Several of them, I think up to 80% still rely on 100% source data verification, this is in contrast to the more academic realm, which does use quite a bit more of the centralized medical monitoring and data monitoring, and I think that was one of the purpose of the FDA guidance is to drive us towards using those methods when appropriate, to more efficiently interrogate the quality of our data as well as the integrity of our trial.

So talking about fraud, there are some very straightforward tools that you can apply to interrogate potential fraud. Mainly this is looking to see if there's non-random distributions of your data. Another thing is, really trying to look at, at a particular site is there a reduced variability, which could indicate that they may be copying data and reproducing it and changing it slightly.

So, by understanding the distribution of data collected from a site versus how it was collected from other sites can really help us target potential sites that have issues. So, one of the views that would help us with this is a way to actually compare the similarity between subjects at a given site in terms of Euclidean distance, here we're just doing a pretty straight distance metric, and often the investigators, if there is fraud, if they are intentionally fabricating data, most of them now are not, are smart enough that if they want to do it, that they don't just copy the data.

They might tweak it a little bit, change it a little bit. And we want to be able to detect that too. So we don't want that to slide through and that's really again, looking at sites that potentially have very little variability at them. Or subjects that look almost identical based on their similarity measures.

So these are just some easy views that take advantage again of that central data. Another key kind of component and straight forward analysis for detecting issues with data quality in terms of fraud is looking for duplicate or triplicate data across findings measurements. So one of the things you could do for example is looking at the vital signs.

You can look for triplicates of subjects that have the exact same heart rate diastolic and systolic blood pressure. It might not be unusual for one lab measurement or one findings measurement to be exact, but when the entire test, set of tests are identical across patients at a site, then we worry that either there's been a data quality issue where the same line got reproduced or someone has actually gone in and reproduced that and fabricated that data.

So again, quick distributions, quick ways to summarize that, and drill down into understanding the quality of the data from that site are key. Another kind of, type of fraud that we can look at instead of assuming there is investigator fraud where they may be fabricating data. We can also try to interrogate patients.

So patients that perhaps the drug has a recreational component to it that certain patients might go to several sites and try to enroll multiple times, so we can try to detect that as well through some simple things like looking at birthdays and initials. So using some of the demographic information that's collected we can try to find that take measures against that early in the trial.

Alright so now I'm lead into fraud is really just one component and one specific case of data quality in a clinical trial. There's a lot more to it and again, fraud is rare but data-quality issues exist no matter what. No matter how honest you're trying to be, sites, it's an enormous amount of data and we don't want to live in this world where we have to rely on onsite 100% source document verification.

Because it's simply not feasible with the size and the complexity of our trials these days, that the component in terms of the cost is outrageous. It can be up to 30% of the cost of a trial. And so what we really want to push for is, yes, trials need to be monitored, but that doesn't mean 100% source verification, instead you can apply statistical approaches to interrogate the risk for potential sites and target those sites that may need further visit or further study.

So, really pushing this can not only increase the efficiency and save us costs, but I think it can make it more accurate, and increase the quality of the data. Again, it's difficult to find issues and some of those issues might not even be important to the trial, like Eric mentioned, 100% source data verification, it gives equal importance to all the data.

So if someone come in and they mis-recorded the age of a patient, that's given the equal weight when they're reviewing the data and they have to look at a piece of paper for that, as it would be of whether or not they have a signed, informed consent, and it's much more important to trial if they don't have a signed, informed consent. So again that's kind of the real impetus of risk based monitoring, it's not just the cost saving, but it's the increase in the quality that we can get.

So again the FDA guidance really kind of is relying on the evolution of both the trial complexity and the evolution of, software tools and electronic data capture systems and the technology that we all now live in. And it is now their recommendation that when it is appropriate risk based monitoring can be applied successfully and at times can pick up over 90% of the findings just from a central monitoring alone.

You can mix that in with targeted on-site monitoring to really efficiently understand the quality and ensure the integrity of our trial. So, as Eric had mentioned, one of the big initiatives right now is the Transcelerate Group, and they've released a set of algorithms straight forward to implement, based on creating indicators of risk that you can incorporate into your data quality, data management system in order to quickly review the data based on certain indicators that are coming in from the sites to know whether or not you need to target that site for further study and further visits.

Another group that has been doing a lot of research and providing recommendations for this is the Clinical Trials Transformation Initiative. And I really like the way they define quality, they say that quality is the absence of errors that matter. So it's not error free. We shouldn't aim to be error free, but we should aim to ensure quality in terms of the absence of errors that matter.

And that's really why we want to do targeted statistical analysis using indicators that are weighted based on what's important to ensuring that quality, integrity of our trial. So by providing some type of indicator risk, and using that in conjunction with our data management and data quality review, we can quickly flag sites and understand the data coming in from sites based on that mantra of ensuring quality in terms of reducing errors that matter.

And these are just a couple of views of ways you can create these risks based on a risk matrix. Which were set forth by the Transcelerate consortium. And you can have these risks and depending on what you want to be looking at say perhaps you're interested more in enrollment metrics or the occurrence of adverse or the types of adverse events.

Perhaps the frequency of severe serious adverse events that are occurring. You can rate each site and have this again annotated color coded framework that allows you to quickly assimilate and understand the data and know what areas have problematic sites. Is there certain monitors where the data coming in from that site is an issue.

And one of the things again to drive home the idea of this electronic data capture, is you can take this information. You can look at these sites and perhaps you can select either the problematic sites and remove them and then continue with your safety reviews. So this is a way you can really start tying in the concurrent analysis of ensuring patient safety through medical monitor review with the data quality review.

So as data comes in we can insure the quality. If there is data that perhaps needs follow up, then we can separate out that data and flag that data for further follow up, but then we can also select those sites that are high quality, and continue our review and drill back in and drive back into the summary views and aggregate views for those patients where they have high quality. Another example of ways to apply risk-based monitoring is through findings analysis as well.

So by looking at site level you can try to detect unusual trends of measurements for findings that are coming in from different sites, as well as just interrogate the baseline values, so given all the baseline values from certain sites, are there already from the get-go before there is even treatment?

Sites that have issues with the measurements that are being taken and recorded. So, again all of this is just taking advantage of the centralized monitoring mantra and ways to improve the efficiency and quality of our data in an on-going review. So to conclude I think this really is a combined effort of data managers, statisticians, clinical monitors, we all have to work together on this ongoing review, and it's key to lead with statistically driven dynamic visualization.

I think this is critical for efficient review from both the medical standpoint, as well as data quality. One of the key things we have found with our customers and with working with the FDA is relying on data standards like the CDISC model really open up the tools that clinicians, bio-statisticians, managers can use in a standardized format.

And again, I think the take home from both I and Eric's talks is use the centralized data monitoring. It's there because we have that technology now. And the complexity of the trial is creating so much data, it's necessary. One of the big things anybody in statistics is studying right now, it's a very big trend which is the idea of big data and big analysis.

And with big data comes a lot of noise. We have to provide tools that allow us to really pick out the signal from that noise, and I think that's where we are now and the next steps forward is not only pick out that signal, but understand that signal and use that signal to perform prediction and improve our trials in the future.

And I'll conclude there, and thank you, and any questions, I'd be glad to take. I'll let you drive the questions.

Questions and Answers

[Question is asked]

Just to repeat for those that couldn't hear, the first part of the question is, is there a need for further regulatory recommendations for data analysis in the review, is that correct, the first part?

So, I'll answer that part first then I'll proceed to the second part. I strongly think there's need. I mean we rely heavily on the FDA guidances that are released as well as those from other regulatory agencies. Speaking again to of some the things Eric showed, we rely heavily on the FDA guidance specific to interrogating drug-induced liver injury, which has several recommendations for standard analysis techniques that can be performed.

Likewise the current guidance on risk based monitoring. It's really more, it's not getting into the statistical qualities as much, but I think there's definitely a continued need for standardization and guidance and recommendation for those analysis. Leading into the second part of the question, which was kind of more of an audit trail and cope-ability in terms of the tracking of an analysis versus just exploration of the data.

"That too I think is key and we have all the tools to do that. So some of the things we've been working on really heavily are creating standard reviews. So we kind of live in this mantra that perhaps there is a statistician doing all the data exploration which we completely track any analyses that are run, and we save logs and information like that."

I think that's critical. But at the end of the day when they have done their full review they can create a work flow or review package that can then be sent and we envision, again with the tools that people can use and create. The capability of that is really we could imagine a case where a statistician performs a review, hands it off to their medical monitor.

They can quickly go through the review looking at each of the steps they took. And then that same review could be packaged and sent straight to FDA, so that they are looking at the exact same data and the exact same review as it was preformed.

I'll repeat your question. If I can hear. [Question is asked] What would set out and then, yeah. Yeah and I think that is a really key point to make. So what he's talking about is can all this interim analysis and exploration of the data kind of distract from the plan set forth, prior to the beginning of the trail. And I see that as actually one of the purposes of exploration, as you start your trial you should adhere to protocol and adhere to the many studies snapshots, and of course there is a lot of research into adaptive trials, but again that is planned intervention.

Part of the, in terms of the first part of the talk the safety analysis has to be on going and we have to be reactionary in part to that if something comes out, but we shouldn't live in a reactionary world and we really should try to become proactive. And I think part of this exploration can be used in order to use these risk-based approaches to better design up front the expectations of our next trials.

So use what we find to do exploration in the terms of predicting the outcomes and do simulations by using this risk based analysis to better decide protocols and when to do snapshots and when to views of the data with our next trial. Does that answer your question? [Response to Question] Yeah declaring, and I think the place we're at right now is part exploration, but part using that declaration to better insure we are declaring a priority instead of reacting to results, and I will mention to that going back to the need for further guidance.

I think the FDA mentioned in their risk-based monitoring guidance the need and the recommendation for quality by design, and that's something we're looking into as well. I think Eric wanted to add to that since he missed out on the question part.

[Eric Speaks]

Something that I think that we both picked up the emotion of. I remember when Steve Wilson from the FDA first saw these types of pools and the ability to [xx] on an ongoing basis we have the same concern [xx] suggested that as long as, of course, along the way the blind would of high certain variables for you during. Prevent as much as we can of giving away blind because [xx] talk to the sites, and they have a sense of which all right.

I think. If there's time, yes. Until they kick me off I'm here. It's really that you have to take the same attitude as if you'd submitted those independent [xx] more concerned about the [xx] thank you.

View full post