Like many professional conversations today, this one began on LinkedIn. Tasfia Roshnee, a PhD candidate at the College of Engineering at Penn State, reached out to share how she was using TriNetX real-world data (RWD) to study integrated care pathways in chronic disease. It quickly became clear that her work went beyond analyzing large datasets; she was applying the kind of methodological rigor and design-first thinking that real-world evidence (RWE) requires to be credible and actionable.
What started as an online connection evolved into a thoughtful exchange on study design, analytical rigor, and transparency in observational research. This blog grew out of those conversations, spotlighting Tasfia’s work and distilling practical lessons for other researchers navigating the complexities of cohort-based RWE studies.
A Q&A on RWD, Study Design, and Research Rigor
I spoke with Tasfia about how she incorporated RWD into her doctoral research, the practical realities of working with electronic health record (EHR) data, and the design decisions that matter most when translating complex data into credible evidence.
How did you first decide to incorporate RWD into your PhD research, and how did TriNetX influence your early study design or hypotheses?
TriNetX became accessible through the College of Engineering at Penn State when my cohort encouraged me to explore it. I had access to millions of EHR records with rich clinical detail (diagnoses, procedures, lab values, medications) all linked longitudinally. My PhD focuses on integrated care pathways in chronic disease, and the database enabled me to study real-world patterns at a scale impossible with traditional data sources.
The platform influenced my early hypotheses by making me think more carefully about operationalizing definitions. I had to decide up front what an exposure or outcome truly meant in EHR data. For example, what does “dental care utilization” look like in medical records? Which codes, which time windows, and what counts as meaningful exposure? Those early decisions became the foundation for everything that followed.
Can you share a specific moment where access to TriNetX data changed the scope or confidence of your research?
I wouldn’t say TriNetX changed the direction of my research overnight, but it absolutely changed my confidence. A specific moment was when I started validating outcomes like HbA1c—a test measuring average blood sugar over 2-3 months—and realized how much measurement timing and visit intensity can vary across patients. Having rapid feasibility checks in TriNetX let me test different index-date and follow-up window definitions early, and sanity-check whether both groups had comparable opportunities for labs and follow-up. That pushed me to refine the windowing and eligibility rules and build sensitivity checks around alternate follow-up windows. It reduced rework later and made the final design much more defensible and interpretable.
What aspects of the TriNetX platform made it feasible to run rigorous RWE analyses alongside the demands of a PhD?
Three things that made TriNetX feasible for PhD-level research:
First, rapid cohort iteration. I could define inclusion/exclusion criteria, index dates, and comparator cohorts efficiently, which made it realistic to iterate on definitions and run planned sensitivity analyses without a long engineering cycle.
Second, built-in analytics and outputs. The integrated tools supported common RWE workflows (e.g., time-to-event summaries, survival curves, post-index lab distributions), so I could focus more on study design, validity checks, and interpretation rather than rebuilding every analysis pipeline from scratch.
Third, clinical depth and longitudinal structure. The availability of labs, medication history, and follow-up over time allowed me to define washout periods, cross-validate outcomes, and track patients longitudinally, which are key ingredients for defensible observational research.
What’s your advice for researchers who are unsure where to start with RWE?
Start with clarity on your research question, then work backward to design decisions. Here’s the framework I wish someone had given me:
- Be precise about the estimand: population, exposure, comparator, outcome, and time window.
- Choose an index date intentionally. It anchors temporality and eligibility.
- Pick a comparator you can defend clinically, ideally one that reduces confounding by indication.
- Plan for messiness upfront. Pre-specify sensitivity analyses, test alternate definitions, and check robustness to measure differences.
- Collaborate with clinicians early to make sure definitions reflect clinical reality and plausible mechanisms.
The key insight: Working with RWE for almost four years taught me that it is all about having a disciplined design that makes imperfect data interpretable. Acknowledge coding variability. Plan for selection bias. Test robustness. Report limitations transparently. These practices strengthen research by demonstrating methodological sophistication and intellectual honesty.
A Practical RWE Design Checklist
As our conversation wrapped up, Tasfia shared a practical checklist she developed through years of conducting cohort-based RWE studies. It isn’t a rigid formula but rather a set of design considerations that consistently determine whether an observational study is credible, interpretable, and reproducible.
Start with a precise research question (PECOT). Before building anything in the platform, define your research question using five components: Population (who is included?), Exposure (what intervention?), Comparator (what are you comparing against?), Outcome (what will you measure?), and Timeframe (over what follow-up window?). Write your question in one sentence that includes all five parts.
Define the index date so time zero is unambiguous. Tie your index date to when exposure truly begins, require a reasonable baseline lookback period to measure covariates, and ensure each group has comparable “time zero” logic. Using different time-zero rules can give one group extra “immortal” time, creating bias before you even run your first analysis.
Choose a comparator you can defend. In observational data, comparator choice often matters more than the statistical model. Select comparators that are clinically meaningful and plausibly similar at baseline and think carefully about care-seeking intensity and access patterns, which can confound results.
Pre-specify follow-up windows and censoring rules. Follow-up design determines what your results mean. Define the start of follow-up, specify the duration, and state your censoring rules clearly (such as end of observation, death, or end of data availability).
Control confounding with a transparent plan. Use a defensible covariate set that includes demographics, clinical history, and relevant treatments. Apply a clear matching or weighting approach and report balance diagnostics such as standardized mean differences.
Operationalize outcomes carefully. Define outcomes clearly using specific codes, time windows, and exclusion criteria. Consider whether outcomes may be detected more often in higher-utilization patients, creating detection bias.
Stress-test your findings. Run sensitivity analyses that test your highest-risk assumptions: alternate follow-up windows, alternate exposure definitions, and alternate covariate sets or matching specifications.
Report decisions so others can reproduce. A reader should be able to understand your design choices and rerun a similar study with the same logic. Transparency about limitations strengthens credibility.
Applying the Checklist in Practice
Tasfia applied this workflow in her published study, “Exploring the association between prophylaxis and diabetic complications among adults with diabetes and periodontal disease.” The study examined whether dental prophylaxis was associated with improved glycemic control and fewer diabetes-related complications. Following a systematic design checklist helped ensure the study was defensible, interpretable, and successful in peer review.
Closing Thought
My conversation with Tasfia reinforced a simple truth: RWD doesn’t simplify research questions, but with disciplined design and transparent methods, it can make complex clinical questions answerable. Her experience shows how thoughtful use of platforms like TriNetX, combined with methodological rigor, can support high-quality RWE even within the constraints of academic research.
Learn More
If you’re part of a healthcare organization looking to apply RWD with the same level of rigor and transparency Tasfia describes, check out our better data for better healthcare research solutions.
About Steve Kundrot
Steve is a technology and business leader with over 20 years of experience in clinical research, health analytics, consulting, and software development. As Chief Operating Officer, he oversees TriNetX’s core operational functions and leads the development of a unified product roadmap designed to revolutionize clinical research and accelerate drug development by optimizing clinical trial design, enhancing post-market safety, and delivering research-grade data and evidence that enable and expedite regulatory approvals.
