CPRD Aurum Sample Dataset October 2021

Release date

Citation: Clinical Practice Research Datalink. (2021). CPRD Aurum Sample Dataset October 2021 (Version 2021.10.001) [Data set]. Clinical Practice Research Datalink. https://doi.org/10.48329/HM7T-QS28


The CPRD Aurum sample dataset is a medium-fidelity synthetic dataset that resembles the real world CPRD Aurum with respect to the data types, data values, data formats, data structure and table relationships. This synthetic dataset can be used for multiple purposes including:

  1. as a sample dataset to understand the structure and utility of the anonymised CPRD Aurum database
  2. to use as a data management teaching/training resource
  3. to develop/validate/test analytics tools for use with CPRD Aurum data
  4. to improve bespoke CPRD Aurum application interfaces/algorithms, e.g. a bespoke cohort selection tool, or
  5. to develop machine learning workflows that can be applied to anonymised CPRD Aurum data.

Further information and access details are available at: www.cprd.com/content/synthetic-data

  • Total number of research acceptable patients: 39,388

  • Percentage UK population coverage (current patients only): 13,858 of 66,796,800 (0.02%)

  • Median (25th and 75th percentile) follow-up time in years for currently registered patients: 9.05 (3.38 – 18.44)

  • Total number of GP practices: 14

  • Percentage coverage of UK general practices (currently contributing practices only): 14 of 8,961 (0.16%) 
Please contact enquiries@cprd.com for further information or if you have any questions.