Powered By TriNetX

  • MIT and BIDMC’s study published by The Lancet introduces PRISM, a model for early pancreatic cancer risk, using a database of over 156 million de-identified US patients from TriNetX, LLC.
  • PRISM accurately forecasts pancreatic cancer risk 18 months early in Americans over 40, with an AUC of 0.826.
  • Offering a breakthrough in early detection, PRISM could identify pancreatic cancer cases 3.5 times more effectively than existing methods.

Pancreatic Ductal Adenocarcinoma (PDAC) is among the most challenging cancers to diagnose early, often leading to poor prognosis and limited treatment options. Recognizing the urgent need for advancements in early detection methods, researchers have focused on leveraging technology to bridge this gap.

Within this context, an innovative project is making waves for its potential to revolutionize the early detection of pancreatic cancer. The development and validation of the PDAC Risk Model (PRISM), an innovative risk prediction tool, marks a pivotal moment in leveraging the vast potential of Electronic Health Records (EHR) for predictive analytics.

Crafted by a visionary BIDMC and MIT team led by Kai Jia, Limor Appelbaum, M.D., Prof. Martin Rinard, in collaboration with TriNetX’s renowned informatics specialists, PRISM epitomizes the powerful intersection of advanced data science and clinical expertise in the fight against one of the most lethal cancers.


Introducing the PRISM AI Diagnostic Tool: Data Utilization and Model Development

PRISM leverages artificial intelligence, analyzing de-identified electronic health records to predict a patient’s risk of developing PDAC within the next six to 18 months. By examining data from 6 million de-identified patients, including 35,387 PDAC cases across 55 U.S. healthcare organizations, the researchers created a system that significantly outperforms existing screening methods. PRISM’s dual AI models—one utilizing complex neural networks and the other a simpler algorithm—evaluate risk scores based on factors such as age, medical history, and lab results. The study, published in eBioMedicine by The Lancet, found that PRISM could identify 35% of patients who would develop pancreatic cancer at a high risk six to 18 months before diagnosis, a marked improvement over the 10% detection rate of current screenings.

At the heart of PRISM’s development lies a narrative of precision and discernment. The inception of PRISM leveraged a selection of 87 pivotal features from a pool of thousands. This methodical process was enabled by the vast and diverse de-identified data set provided by TriNetX’s privacy-preserving federated network, amplifying the critical role of high-quality, large-scale data in health research.

“The TriNetX database was crucial in our feature selection since the process was fully data-driven and the quality of results totally depended on the quality and the scale of the de-identified dataset,” says Kai Jia, a principal investigator on the study at MIT Department of Electrical Engineering and Computer Science.


Surpassing Other AI Models in Unprecedented Accuracy and Generalizability

PRISM’s development utilized advanced neural networks and logistic regression, culminating in a model with remarkable accuracy (AUC of 0.826) and broad applicability across different populations. This achievement was not without its challenges, particularly in ensuring robustness across geographic locations and over time.

“Without the federated network of TriNetX, it would be impossible to reliably access the model generalizability on so many dimensions,” shared Jia.

The diversity of the TriNetX data set, enriched with geographic markers, was instrumental in identifying and overcoming these obstacles.


Envisioning Clinical Deployment

Incorporating PRISM into clinical practice presents its own set of challenges, primarily concerning the seamless transition from research to real-world application. The path to integrating PRISM into clinical settings is envisioned in two strategic scenarios: (1) enhancing current screening criteria and (2) identifying high-risk individuals from the general population for further testing. TriNetX supports this integration by facilitating the deployment of PRISM on de-identified patient data, generating individual risk scores, and ensuring these can be acted upon within healthcare organizations.

“TriNetX can enable widespread dissemination of PRISM, providing the opportunity for large-scale dissemination to interested HCOs within the network,” adds Jai.


Looking Ahead: The Impact of PRISM on Pancreatic Cancer Detection

The journey of PRISM from a research model to a clinical tool includes ongoing prospective validation studies, with TriNetX playing a pivotal role in assessing its real-world efficacy. These efforts are expected to contribute significantly to expanding current pancreatic cancer screening protocols, ultimately impacting public health strategies.

“TriNetX has a critical role in the broader adoption of such models,” Prof. Marin Rinard anticipates, pointing towards a future where data-driven risk prediction models become integral to healthcare.


Conclusion: A New Era in Public Health and Technology

The development of the PRISM model illustrates the transformative potential of leveraging large-scale, diverse data sets for health research. TriNetX’s federated network has not only enabled the creation of a model with the power to detect pancreatic cancer earlier across a broad population but also set the stage for its integration into clinical practice. As we move forward, the collaboration between data scientists, researchers, and healthcare providers promises to usher in a new era of early detection strategies, emphasizing the critical importance of data in advancing public health.


Discover What’s Next

Take possession of your data set in the format and environment you choose. Download CSV files from our platform or ask us to deliver the data tables via Amazon Data Exchange, Amazon S3, Snowflake, Databricks, or any other web-based data service.

Flashlight illuminating night sky overlaid with data programming

When it comes to bringing better treatments to patients, speed matters. Real-world evidence generation needs to keep pace. Historically, it hasn’t. Data sourcing, licensing, integration, staging, and analysis have remained fragmented processes prone to delay.