Optimizing Data Quality in Pharmacoepidemiologic Studies
Authors: Jeffrey Brown, PhD, Chief Scientific Officer at TriNetX and Matvey Palchuk, MD, Chief Medical Informatics Officer at TriNetX
Pharmacoepidemiology, the study of the use and effects of drugs in large populations, has seen a significant evolution with the increasing availability of real-world data (RWD). Derived from sources outside of traditional clinical trials including electronic health records (EHRs), patient registries, claims and billing data, pharmacy records, and wearables, among others, RWD:
- Complements clinical trial data,
- Offers a comprehensive view of patient outcomes in routine clinical practice,
- Enables robust and generalizable findings into the effectiveness and safety of medical interventions in real-world settings, and
- Powers real-world evidence generation for gleaning actionable insights that may help support clinical decision making.
Along with the increasing use of RWD in pharmacoepidemiologic studies comes increasing scrutiny about the quality of the data, and with good reason. Healthcare generates around 30 percent of the world’s total data volume, and this growth is largely driven by advancements in RWD sources. This trend is expected to continue, with healthcare data projected to increase at a compound annual growth rate of 36 percent through 2025.
Given the sheer volume of healthcare RWD, the challenge for researchers lies in extracting data that’s fit for its intended purpose. Compounding the challenge are RWD’s complexity, variability, and oftentimes unstructured nature and lack of interoperability.
“An analyst trying to use RWD to answer research questions must be mindful that this data was collected for clinical, administrative, or operational purposes. Researchers and others working with the data need to be cognizant that they’re utilizing the data for a secondary purpose. Since it wasn’t collected with a specific research question in mind as in a controlled clinical trial, the big challenge is being thoughtful when applying a particular research question to the vast amount of RWD available to find the right, relevant data for the research at hand.” – Matvey Palchuk, MD
“Good” Versus “Bad” Data
There’s a common misconception that healthcare data is either of “good” quality or it’s of “bad” quality. The reality is different types of RWD are relevant for answering different research questions, with different strengths and limitations. These strengths and limitations should be considered when sourcing RWD for specific research purposes. To illustrate this point, consider two COVID-19-related research studies utilizing the same set of RWD.
In the early days of the COVID-19 pandemic, documentation of COVID-19 infections and later, documentation of COVID-19 vaccination status, was not readily captured by most RWD sources such as EHR and claims data. The documentation, if acquired at all, was captured in public health databases. This left many RWD sources with a common issue of “missingness” that leads to misclassification of prior COVID-19 infection status and COVID-19 vaccination status.
Can those data, with the known misclassification, be used for research? It depends. Studies comparing vaccinated patients to unvaccinated patients would be plagued by exposure misclassification as many people who were vaccinated (e.g., at public vaccination centers) would be observed as “unvaccinated” because their vaccine status was not documented in EHR or claims data. This creates a bias for a comparison of vaccinated versus unvaccinated.
However, a comparison of two COVID-19 vaccines (or therapies) could be conducted because the exposure information, albeit incomplete, is reliable and relevant. In this example, the same dataset could be fit for purpose for one COVID-19 vaccine study but not for a different, yet similar study.
What these studies demonstrate is that the quality of the RWD hinges on its intended use (e.g., What’s the specific purpose or objective for which the data is being collected, analyzed, and applied?) and fitness for purpose (e.g., Is this data suitable for addressing the specific research question or objective at hand?).
As illustrated by the COVID-19 example, a RWD source could be perfectly sufficient for meeting the specific objectives of one study but not for a similar study on the same topic. By clearly defining the intended use of RWD in a study, researchers can align their data collection, processing, and analysis efforts with the specific goals of the research, ensuring that the data is fit for purpose and that the results are meaningful and actionable.
“The trick to effectively using RWD is to match the data to the research question, the method, and the intended use. You also need a team of folks who understand where the data came from and how to use it to answer the research question — data scientists, methodologists, informaticists, epidemiologists, and clinicians. That’s what TriNetX brings to the table. We have the methodologies, expertise, and insights to move the needle on RWD-driven research throughout the drug development lifecycle.” – Jeffrey Brown, PhD
Getting the Right Answer: The TriNetX Approach to Quality and Fit-for-Purpose Data
At TriNetX, our mission is to help healthcare organizations and life sciences companies around the world answer their complex research questions, ultimately leading to better healthcare solutions and improved patient outcomes.
We follow a four-step approach to making data fit for purpose at an unprecedented scale, improving its value and enabling accurate, efficient, and meaningful outcomes in healthcare research.
Liberate all health data | Preserve the original & document provenance | Harmonize for interoperability | Actively monitor quality |
Representative Collect data as close as possible to the source where it was recorded. |
Preservation Preserve data as recorded in the source system. Refrain from altering data. |
Syntactic interop Harmonize to a Common Data Model for consistent analytics. |
Data quality framework Manage data quality through a uniform framework (conformance, completeness, plausibility). Develop new metrics for RWD quality management. |
All-encompassing Collect data across all patients from all systems and care settings. |
Provenance Provide transparency on origin (sources), any transformation (methods, mappings), curation or augmentation (derivation) of source information. |
Semantic interop Univocally encode all data using a well-managed terminology. Preserve meaning, document mapping. |
Ingestion QC Check conformance close to the source. |
Keep data flowing Collect historical data and update frequently (ideally real-time) from source systems. |
Path to patient Maintain the ability to go back to the original patient (in strict compliance with privacy regulations and applicable laws).
|
Terminology Base terminology on existing terminologies, when possible, refrain from altering. |
Monitor and act Monitor DQ parameters, detect anomality, act by correcting ingestion or through feedback to the data source. |
A Look Ahead
The integration of RWD into pharmacoepidemiologic studies marks a substantial advancement in the field, offering a more comprehensive and nuanced understanding of drug safety, effectiveness, and utilization in everyday clinical settings. By harnessing the power of diverse data sources, researchers can uncover patterns and outcomes that traditional randomized controlled clinical trials might overlook.
This approach not only enhances the external validity of findings but also supports more personalized and responsive healthcare. As RWD continues to evolve, it will play an increasingly pivotal role in informing regulatory decisions, guiding clinical practice, and ultimately improving healthcare for all, globally.
For researchers, the key is aligning the data’s characteristics with the specific research question and ensuring that the RWD they use is of high quality and relevance and fit for the intended purpose. TriNetX will continue to lead the charge on making RWD research-ready, paving the way to more reliable and actionable insights in healthcare.
Learn more about TriNetX’s real-world data and real-world evidence generation solutions and uncover the fit-for-purpose data you need to answer your complex research questions.