Secondary use of data refers to the utilisation of data already gathered for another purpose. These data can be further linked to prospectively collected medical and non-medical data. Electronic health care databases (e.g. claims databases, electronic medical records / electronic health records) and patient registries are examples of sources of data that can be leveraged as secondary data for pharmacoepidemiology studies.
The last decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains information on existing European databases that may be used for pharmacoepidemiology research. However, this field is continuously evolving. Multi-centre studies presenting lists of databases are regularly published.
A description of the main features and applications of frequently used electronic healthcare databases for pharmacoepidemiology research in the United States and in Europe is presented in the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2019, Chapters 11-14). In general, the limitations of using electronic healthcare databases should be acknowledged, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol. 2005; 58(4): 323-37).
The primary purpose of the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf. 2012;21(1):1-10) is to assist in the selection and use of data sources for pharmacoepidemiology research by highlighting potential limitations and recommending correct procedures for data analysis and interpretation. This guideline refers to the secondary use of databases containing routinely collected healthcare information such as electronic medical records (from either primary or secondary care) and claims databases, and does not include spontaneous adverse drug reaction reporting databases. A section of the guideline is dedicated to multi-database studies. The document also contains references to data quality and validation procedures, data processing/transformation, and data privacy and security (see also Chapter 11.2 Data quality frameworks).
The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets (2013) provides criteria for best practice that apply to design, analysis, conduct and documentation. It emphasizes that investigators should understand the potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of outcomes of interest in the proposed study and captured in the database. Guidance for conduction studies within electronic healthcare databases can also be found in the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices (ISPE GPP, 2015), in particular sections IV-B (Study conduct, Data collection). This guidance emphasizes the importance of patient data protection.
The concepts of “Real-world data” (RWD) and “Real-world evidence” (RWE) are increasingly used in the regulatory setting to denote the secondary use of observational data and pharmacoepidemiological methods for regulatory decision-making, although these terms can also apply to primary data collection. The article Real-World Data for Regulatory Decision Making: Challenges and Possible Solutions for Europe (Clin Pharmacol Ther. 2019;106(1):36-9) describes the operational, technical and methodological challenges for the acceptability of real-world data for regulatory purposes and presents possible solutions to address these challenges. The FDA’s Real-World Evidence website also provides definitions and links to a set of useful guidelines on the submission and use of real-world data, including electronic health care databases, to support decision-making. The Joint ISPE-ISPOR Special Task Force Report on Good Practices for Real‐World Data Studies of Treatment and/or Comparative Effectiveness (2017) recommends good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER) and reviews methodological issues and possible solutions for CER studies based on secondary data analysis (see also Chapter 14.1 on comparative effectiveness research). Many of the principles are applicable to studies with other objectives than CER, but some aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.
The majority of the examples and methods covered in Chapter 5 are based on studies and methodologic developments in secondary use of healthcare databases, since this is one of the most frequent approaches used in pharmacoepidemiology. Several potential issues need to be considered in the use of electronic healthcare data for pharmacoepidemiological studies as they may affect the validity of the results. They include completeness of data capture, bias in the assessment of exposure, outcome and covariates, variability between data sources and the impact of changes over time in the data (as has been noted in the pre- vs. post-COVID-19 period), access methodology and the healthcare system of the country or region covered by the database.