Why Two Clinical Teams Can Run the Same AI and Get Completely Different Results
Key takeaways:
- AI performance in clinical trials is determined by the quality of the data beneath the model, not the sophistication of the algorithm. Two organizations deploying the same technology on different data will get different results.
- Four characteristics separate data that can support trustworthy AI from data that can’t: comprehensiveness, quality, recency, and transparency. Gaps in any of the four show up in every AI output a team generates.
- Organizations that have closed those gaps are achieving results that are difficult to reach any other way, including a 189% expansion of eligible patient pools in one COPD trial. The full framework, the case studies, and a diagnostic evaluation guide are in The Real-World Data Advantage: Why Clinical Operations Teams Are Rethinking AI Strategy.
The clinical trial industry has made a significant bet on artificial intelligence (AI). AI was supposed to fix the feasibility problem, the one where predictions about patient availability don’t survive contact with reality. It was supposed to make site selection sharper, protocol design smarter, and recruitment faster.
For many organizations, the investment has been substantial. The results have been harder to quantify.
Forecasts still miss. Site recommendations still misalign with actual patient availability. Protocol tools still fail to anticipate operational complexity. The gap between what AI promised and what it delivers sits quietly in the background of planning meetings, post-study reviews, and budget conversations.
The instinct when this happens is to look at the algorithm, to wonder whether a different model, a different vendor, or a different configuration would close the gap. That instinct is understandable. It’s also almost always wrong.
The problem isn’t the AI. It’s the data feeding it.
Same Model, Different Data, Different Results
An AI model is an inference engine. It takes in data, finds patterns, and produces outputs: predictions, recommendations, risk scores. The quality of those outputs is a direct function of the quality of the inputs.
Feed a model narrow data and it produces narrow insights. Feed it outdated data and it produces outdated predictions. Feed it inconsistently coded data from dozens of different systems and it produces outputs that reflect the inconsistency, not the underlying clinical reality.
This is why two organizations can deploy the same AI technology and get completely different results. The differentiator isn’t the algorithm. It’s the data beneath it.
Organizations that are seeing real gains from clinical AI (measurably shorter timelines, better feasibility predictions, higher patient identification rates) didn’t get there by finding a better model. They got there by strengthening the foundation the model sits on.
Four Characteristics That Determine Whether Data Can Power Trustworthy AI
Not all real-world data (RWD) are created equal. The gap between data that can support high-performing AI and data that can’t comes down to four characteristics.
Comprehensiveness. A model trained on data from a limited geography, a narrow set of health systems, or a single therapeutic area reflects those limitations in every output it generates. The breadth of the underlying data, across populations, geographies, care settings, and conditions, determines the breadth of what the model can reliably know.
Quality. Healthcare data are inherently messy. Coding systems vary. Documentation practices differ across institutions and clinicians. The same condition can be recorded in a dozen different ways depending on the system, the specialty, and the year. AI doesn’t smooth over these inconsistencies. It amplifies them. Data that haven’t been rigorously normalized, mapped, and clinically validated produce outputs that look precise but aren’t.
Recency. Static datasets degrade. Treatment patterns evolve. Patient populations shift. Site capabilities change as staff turn over and infrastructure is updated. A feasibility prediction built on data that are six months, 12 months, or two years old is a prediction about a healthcare system that no longer exists.
Transparency. If you can’t trace an AI output back to its data source, if you can’t see what went in, how it was prepared, and where it came from, you can’t evaluate it, defend it, or trust it. For any organization building toward regulatory-grade real-world evidence (RWE), full provenance isn’t a nice-to-have. It’s a requirement.
What Happens When the Foundation Is Right: One Example
The cleanest way to see what high-quality data enable is to look at what happens when AI is applied to eligibility criteria that have historically relied on rigid coded fields.
A UK–Germany COPD trial identified 230,750 eligible patients using traditional coded criteria. When TriNetX applied AI-optimized criteria to richer, better-structured RWD covering the same population, eligibility grew to 666,200, an increase of 189%.
The science didn’t change. The inclusion criteria didn’t loosen. The data were comprehensive enough, and structured rigorously enough, for AI to find patients that coded criteria alone had missed. With a pool nearly three times larger, the trial team had dramatically more flexibility to hit enrollment targets without expanding the scope of irrelevant candidates.
That kind of result is not an isolated case. Across TriNetX-powered programs, AI models have flagged lupus trial candidates before they met formal eligibility criteria, identified IBD patients at high risk of a disease flare with 85% accuracy (up from 33% using traditional clinical methods), and predicted pancreatic cancer risk up to 18 months before diagnosis using RWD from 35,000 patients and 1.5 million controls. Each is detailed, with the underlying data architecture that made it possible, in the full guide, The Real-World Data Advantage.
The pattern is consistent: the data foundation came first. The AI performance followed.
Where to Start
If your AI results have been inconsistent, the most productive question isn’t “which algorithm should we try next?” It’s “what is our data actually able to support?”
That question has a framework. The four pillars (comprehensiveness, quality, recency, and transparency) give you a structured way to evaluate where your data foundation is strong and where the gaps are most likely creating drag on AI performance.
The full framework, six quantified clinical outcomes, and a diagnostic evaluation guide for assessing AI and RWD partners are in The Real-World Data Advantage: Why Clinical Operations Teams Are Rethinking AI Strategy.
About Steve Kundrot
Steve is a technology and business leader with over 20 years of experience in clinical research, health analytics, consulting, and software development. As Chief Operating Officer, he oversees TriNetX’s core operational functions and leads the development of a unified product roadmap designed to revolutionize clinical research and accelerate drug development by optimizing clinical trial design, enhancing post-market safety, and delivering research-grade data and evidence that enable and expedite regulatory approvals.





