Secondary data collection refers to collection of data already gathered for another purpose (e.g. electronic and non-electronic healthcare data). These can be further linked to non-medical data, as socio-economic or lifestyle factors. The last decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains information on existing European databases. However, this field is continuously involving and it is recommended to look for recently published reviews and lists of databases.
A comprehensive description of the main features and applications of frequently used electronic healthcare databases for pharmacoepidemiology research in the United States and in Europe appears in the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012, Chapters 11 - 18). The limitations existing in using electronic healthcare databases should be acknowledged, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol 2005; 58: 23-337).
The primary purpose of the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf 2012;21:1-10) is to assist in the selection and use of data resources in pharmacoepidemiology by highlighting potential limitations and recommending correct procedures. This text mainly refers to databases of routinely collected healthcare information such as electronic medical records and claims databases and does not include spontaneous reporting databases. It is a simple, well-structured guideline that will help investigators to select databases and helps database custodians to describe their database in a useful manner. An entire section is dedicated to the use of multi-database studies. The document also contains references to data quality and validation procedures, data processing/transformation, privacy and security.
The Working Group for the Survey and Utilisation of Secondary Data (AGENS) with representatives from the German Society for Social Medicine and Prevention (DGSPM) and the German Society for Epidemiology (DGEpi) developed a Good Practice in Secondary Data Analysis Version 2 aiming to establish a standard for planning, conducting and analysing studies on the basis of secondary data. The guidance is also aimed to be used as the basis for contracts between data owners (so-called primary users) and secondary users. It is divided into 11 sections addressing, among other aspects, the study protocol, quality assurance and data protection.
The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets provides criteria for best practice that apply to design, analysis, conduct and documentation. It emphasizes that investigators should understand the potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of safety outcomes of interest in the proposed study and captured in the database.
Guidance for conduction studies within electronic healthcare databases can also be found in the ISPE GPP, in particular sections IV-B (Study conduct, Data collection). This guidance emphasizes the importance of patient data protection.
The International Society for Pharmacoeconomics and Outcome Research (ISPOR) established a task force to recommend good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER). The Task Force has subsequently published three articles (Part I, Part II and Part III) that review methodological issues and possible solutions for CER studies based on secondary data analysis (see also Chapter 10.1 on comparative effectiveness research). Many of the principles are applicable to studies with other objectives than CER, but aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.
Particular issues to be considered in the use of electronic healthcare data for pharmacoepidemiological research include completeness of data capture, bias in the assessment of exposure, outcome and covariates, variability between data sources and the impact of changes over time in data, access methodology and the healthcare system.
The majority of the examples and methods covered in Chapter 5 are based on studies and methodologic developments in secondary data collection, since this is the most frequent approach used in pharmacoepidemiology.
Chapter 4.6. deals with models of studies conducted across multiple data sources.