TriNetX Data Sets

Explore real-world data from TriNetX in your own environment

About TriNetX Licensed Data Sets

To understand the patient journey, you may need to dive deeper than aggregate counts and means. Spanning domains from diagnoses to genomic variants, our data sets attribute every observation to a pseudonymized patient ID, encounter, and date, making it straightforward to build robust longitudinal pictures of today’s patients. We deliver the data in a universal, ready-to-use format—linked CSV tables—to give you the freedom to analyze on any application you choose.

TriNetX Data Sets

Linked Data Sets

Our linked data sets combine data sourced from EHR and insurance claims into a single, longitudinal record - for each one of the 7.4 million patients represented in both sources. Secure and rapid tokenization allows us to match EHR and claim records on a per-patient basis without ever accessing or exposuring personally identifying information. The result is a robust record that follows a patient across time and between providers, bringing demographics, clinical observations, treatment details, and costs under one view. By further linking with federal death registries and private obituaries, we support analyses of long-term survival in addition to the full array of HEOR, efficacy, and safety analyses.

Use Cases

  • Incidence and prevalence
  • Long-term safety and efficacy
  • Treatment patterns
  • Drug adherence and persistence
  • Burden of Illness
  • Cost of care
  • Disease Progression
  • Overall survival

Key Data Elements

  • Demographics
  • Diagnoses
  • Procedures
  • Medications
  • Labs
  • Encounters
  • Enrollment
  • Claim headers & lines
  • Costs
  • Rx fills

Curated Oncology Data Sets

TriNetX Curated Oncology Data Sets delivers rich, CTR-curated oncology data from 17 Centers of Excellence in the U.S. to power precision cancer research. We currently offer breast cancer and lung cancer data sets, drawing data from our healthcare organizations offering cancer registry data, genomics or oncology EMR data.

Use Cases

  • Feasibility
  • Site Identification
  • Survival
  • Patient recruitment
  • Burden of illness
  • Disease progression
  • Incidence and prevalence
  • Long term outcomes
  • Long term effectiveness
  • Treatment pathways
  • Patient journeys
  • Real-world evidence generation from
    co-morbidities to treatment landscapes

Key Data Elements

  • Structured EHR data
  • Labs
  • Stage and Grade
  • Histology
  • Genomics
  • Biomarkers
  • Surgery
  • Chemotherapy
  • Radiation
  • Immunotherapy
  • Hormone Therapy
  • Clinical Notes via NLP
  • Death Registries

Curated Oncology Data Delivered

Learn more about our Curated Oncology Data Sets and Global Oncology Network


Dataworks Data Sets

Dataworks is our largest single resource for data sets built from EHR data enriched with labs and mortality data. Patients in Dataworks represent all 9 U.S. census divisions and show an average of 55 diagnosis codes, 37 procedure codes, 211 medication codes, and 124 lab results. More than 20 million patients have records that extend at least five years.

Use Cases

  • Conduct precise time-to-event analysis
  • Track in-clinic medications and procedures
  • Compare multiple cohorts at once, along any number of characteristics
  • Reconstruct individual patient histories
  • Follow changes in lab values
  • Train predictive models using thousands
    of well-represented patient co-variates

Key Data Elements

  • Demographics
  • Diagnoses
  • Procedures
  • Medications
  • Labs
  • Encounters

Open Claims Data Sets

TriNetX Open Claims data sets are built to close data gaps and provide insights throughout the product lifecycle, including market analysis and forecasting. Claims are sourced from clearinghouses that have processed more than 4 billion medical and pharmacy claims for over 300M patients since 2014; data is refreshed daily to enable an up-to-date understanding of market access and treatment trends. Payer ID and name let you analyze payer dynamics for any patient population and treatment.

Key Data Elements

  • Claims
  • Diagnosis
  • Physician Information
  • Patient Characteristics (SDOH)
  • Plan Information

Use Cases

  • Understand market opportunities and barriers to patient access
  • Inform study design and eligibility
  • Evaluate post-launch performance quickly and precisely

KOL Activity Reports

Our Key Opinion Leader (KOL) activity reports detail which physicians are treating which patients, by condition, drug, and provider specialty. Each physician in your customized report is identified by NPI, full name, mailing address, phone number, and email address when available. We generate our KOL activity reports from up-to-the-day, open claims data representing 1.8 million providers at more than 300,000 health care facilities, from academic and acute care hospitals to outpatient clinics. As a group, these providers are submitting claims that cover 98% of US payers.

Use Cases

  • Optimize site selection with diagnosis
    and treatment patterns by geography
  • Equip MSLs with provider profiles

Key Data Elements

  • NPI
  • Full Name
  • Mailing Address
  • Phone Number
  • Email address
    (when available)
  • Provider taxonomies

Bring Your Own Data

Bring Your Own Data covers a range of services that allow you to enrich and explore data you already own. With privacy-preserving tokens, TriNetX can add EHR, tumor registry, genomic, claims, and mortality data to the de-identified records you bring to us. We can deliver the integrated records as a set of .csv tables and/or host the data on our platform for easy querying and analysis.

Use Cases

  • Combine SDoH, wearables data, and patient-reported data with EHR and claims data
  • Add medical history data to trial participants you have tokenized
  • Uncover post-trial clinical events for trial participants