4.1.1. General considerations
4.1.3. Randomised controlled trails
4.3. Patient registries
4.3.3. Methodological aspects
4.3.4. Population registries
4.4. Spontaneous reports
4.5. Social media
4.5.2. Use in pharmacovigilance
4.5.4. Data protection
4.6.1. General considerations
There are two main approaches for data collection: collection of data specifically for a particular study (‘primary data collection’) or use of data already collected for another purpose, e.g. as part of administrative records of patient health care (‘secondary use of data’). The distinction between primary data collection and secondary use of data is important for marketing authorisation holders as it implies different regulatory requirements for the collection and reporting of suspected adverse reactions, as described in Module VI of the Guideline on good pharmacovigilance practice (GVP) - Management and reporting of adverse reactions to medicinal products. It has also implications in terms of data privacy requirements.
Secondary use of data has become a common approach in pharmacoepidemiology due to the increasing availability of electronic healthcare records, administrative claims data and other already existing data sources (see Chapter 4.2 Secondary use of data) and due to its increased efficiency and lower cost. In addition, networking between centres active in the pharmacoepidemiology and pharmacovigilance fields is rapidly changing the landscape of drug safety and effectiveness research in Europe, both in terms of availability of networks of data sources and networks of researchers who can contribute to a particular study with a particular data source (see Chapter 4.6 Research Networks for multi-database studies).
The methodological aspects of primary data collection studies (also sometimes referred to as field studies) are well covered in the textbooks and guidelines referred to in the Introduction chapter. Annex 1 of Module VIII of the Good pharmacovigilance practice provides examples of different study designs based on prospective primary data collection such as cross-sectional study, prospective cohort study, active surveillance. Surveys and randomised controlled trials are also presented below as examples of primary data collection.
Studies using hospital or community-based primary data collection have allowed the evaluation of drug-disease associations for rare complex conditions that require very large source populations and in-depth case assessment by clinical experts. Classic examples are Appetite-Suppressant Drugs and the Risk of Primary Pulmonary Hypertension (N Engl J Med. 1996;335:609-16), The design of a study of the drug etiology of agranulocytosis and aplastic anemia (Eur J Clin Pharmacol. 1983;24:833-6) and Medication Use and the Risk of Stevens–Johnson Syndrome or Toxic Epidermal Necrolysis (N Engl J Med. 1995;333:1600-8). For some conditions, case-control surveillance networks have been developed and used for selected studies and for signal generation and evaluation, e.g. Signal generation and clarification: use of case-control data (Pharmacoepidemiol Drug Saf 2001;10:197-203).
Data can be collected using paper, electronic case report forms or, increasingly, smartphone or web applications provided to patients. Possibilities, Problems, and Perspectives of Data Collection by Mobile Apps in Longitudinal Epidemiological Studies: Scoping Review (J Med Internet Res. 2021;23(1):e17691) concludes that using mobile technologies can help to overcome challenges linked to data collection in epidemiological research, but the applicability and acceptance of these mobile apps in various subpopulations need to be further studied.
The book Research Methods in Education (J. Check, RK. Schutt, Sage Publications, 2011) defines survey research as "the collection of information from a sample of individuals through their responses to questions" (p. 160). This type of research allows for a variety of methods to recruit participants, collects data and utilises various instruments.
A survey is the collection of data on knowledge, attitudes, behaviour, practices, opinions, beliefs or feelings of selected groups of individuals from a specific sampling frame, by asking them questions in person or by post, phone or online. They generally have a cross-sectional design, but repeated measures over time may be performed for the assessment of trends.
Surveys have long being used in fields such as market research, social sciences and epidemiology. General guidance on constructing and testing the survey questionnaire, modes of data collection, sampling frames and ways to achieve representativeness can be found in general texts (Survey Sampling (L. Kish, Wiley, 1995) and Survey Methodology (R.M. Groves, F.J. Fowler, M.P. Couper et al., 2nd Edition, Wiley 2009). The book Quality of Life: the assessment, analysis and interpretation of patient-related outcomes (P.M. Fayers, D. Machin, 2nd Edition, Wiley, 2007) offers a comprehensive review of the theory and practice of developing, testing and analysing quality of life questionnaires in different settings.
Surveys have an important role in the evaluation of the effectiveness of risk minimisation measures (RMM) or of a risk evaluation and mitigation strategy (REMS) (see Chapter 14.4). The application of methods described in these aforementioned textbooks needs adaptation for surveys to evaluate the effectiveness of RMM or REMS. For example, the extensive methods for questionnaire development of quality of life scales (construct, criterion and content validity, inter-rater and test-retest reliability, sensitivity and responsiveness) are not appropriate to questionnaires for RMM which are often used only once. The EMA and FDA issued guidance documents on the conduct of surveys for risk minimisation (RM) which, together, encompass the selection of risk minimisation measures, study design, instrument development, data collection, processing and data analysis and presentation of results. This guidance include the draft EMA Guideline on good pharmacovigilance practices (GVP) Module XVI – Risk minimisation measures: selection of tools and effectiveness indicators (Rev 3) (2021), the FDA draft guidance for industry REMS Assessment: Planning and Reporting on REMS (2019) and the FDA Guidance on Survey Methodologies to Assess REMS Goals That Relate to Knowledge (2019). A checklist to assess the quality of studies evaluating risk management programs is provided in The RIMES Statement: A Checklist to Assess the Quality of Studies Evaluating Risk Minimization Programs for Medicinal Products (Drug Saf. 2018;41(4): 389-401). The article Are Risk Minimization Measures for Approved Drugs in Europe Effective? A Systematic Review (Expert Opin Drug Saf. 2019;18(5):443-54) highlights the need for improvement in the methods and presentation of results and for more hybrid designs that link survey data with health and safety outcomes as requested by regulators. This article also reports on low response rates found in many studies, allowing for the possibility of important bias. The response rate should therefore be reported in a standardised way in surveys to allow comparisons. Standard Definitions. Final Dispositions of Case Codes and Outcome Rates for Surveys (2016) of the American Association for Public Opinion Research provides standard definitions which can be adapted to RM surveys and the FDA Guidance on Survey Methodologies to Assess REMS Goals That Relate to Knowledge (2019) provides guidance for RM surveys.
An important aspect of surveys is sampling. A clustered random sample is often used in surveys. However, attention shall be paid to the selection of the original list of the target population. For example, if the evaluation of the awareness about an educational material is part of the objectives, the same lists which were used to distribute the educational material cannot be used for sampling the survey, otherwise a selection bias cannot be excluded.
The increasing use of online RMM require that survey methods adapt but should not sacrifice representativeness by accessing only populations which visit these websites. They should provide evidence that the results using these sampling methods are not biased. Similarly, the increasing use of health care professional and patient panels needs to ensure that survey methods do not sacrifice representativeness by accessing only self-selected participants in these panels and should provide evidence that the results are not biased by using these convenient sampling frames.
The issue of thresholds to assess the effectiveness of RMM remains a topic of debate. This topic is discussed in the aforementioned EMA and FDA documents and the article Are Risk Minimization Measures for Approved Drugs in Europe Effective? A Systematic Review (Expert Opin Drug Saf. 2019;18(5):443-54). The thresholds need to be viewed in the context of their potential impact on the benefit-risk balance. Composite thresholds for all of three aspects (awareness, knowledge and behaviour) of RM effectiveness are hardly achieved.
The draft EMA Guideline on good pharmacovigilance practices (GVP) Module XVI – Risk minimisation measures: selection of tools and effectiveness indicators (Rev 3) (2021) encourages the evaluation of process indicators being linked to health outcomes. A holistic evaluation of non-targeted effects as well as product-specific targeted effects has so far been performed in only a minority of studies, as shown in Risk Minimisation Evaluation with Process Indicators and Behavioural or Health Outcomes in Europe: Systematic Review (Expert Opin Drug Saf. 2019;18(5):443-54).
Randomised controlled trials are an experimental design that involves primary data collection. There are numerous textbooks and publications on methodological and operational aspects of clinical trials which are not covered here. An essential guideline on clinical trials is the European Medicines Agency (EMA) Guideline for good clinical practice E6(R2), which specifies obligations for the conduct of clinical trials to ensure that the data generated in the trial are valid. From a legal perspective, the Volume 10 of the Rules Governing Medicinal Products in the European Union contains all guidance and legislation relevant for conduct of clinical trials. A number of documents are under revision.
The way clinical trials are conducted in the European Union (EU) will undergo a major change when the Clinical Trial Regulation (Regulation (EU) No 536/2014) will come fully into effect and will replace the existing Directive 2001/20/EC.
Hybrid data collection as used in pragmatic trials, large simple trials and randomised database studies are described in Chapter 5.3.1.
Secondary use of data refers to the utilisation of data already gathered for another purpose. These data can be further linked to prospectively collected medical and non-medical data. Electronic health care databases (e.g. claims databases, electronic medical records / electronic health records) and patient registries are examples of sources of data that can be leveraged as secondary data for pharmacoepidemiology studies.
The last decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains information on existing European databases that may be used for pharmacoepidemiology research. However, this field is continuously evolving. Multi-centre studies presenting lists of databases are regularly published.
A description of the main features and applications of frequently used electronic healthcare databases for pharmacoepidemiology research in the United States and in Europe is presented in the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2019, Chapters 11-14). In general, the limitations of using electronic healthcare databases should be acknowledged, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol. 2005; 58(4): 323-37).
The primary purpose of the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf. 2012;21(1):1-10) is to assist in the selection and use of data sources for pharmacoepidemiology research by highlighting potential limitations and recommending correct procedures for data analysis and interpretation. This guideline refers to the secondary use of databases containing routinely collected healthcare information such as electronic medical records (from either primary or secondary care) and claims databases, and does not include spontaneous adverse drug reaction reporting databases. A section of the guideline is dedicated to multi-database studies. The document also contains references to data quality and validation procedures, data processing/transformation, and data privacy and security (see also Chapter 11.2 Data quality frameworks).
The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets (2013) provides criteria for best practice that apply to design, analysis, conduct and documentation. It emphasizes that investigators should understand the potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of outcomes of interest in the proposed study and captured in the database. Guidance for conduction studies within electronic healthcare databases can also be found in the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices (ISPE GPP, 2015), in particular sections IV-B (Study conduct, Data collection). This guidance emphasizes the importance of patient data protection.
The concepts of “Real-world data” (RWD) and “Real-world evidence” (RWE) are increasingly used in the regulatory setting to denote the secondary use of observational data and pharmacoepidemiological methods for regulatory decision-making, although these terms can also apply to primary data collection. The article Real-World Data for Regulatory Decision Making: Challenges and Possible Solutions for Europe (Clin Pharmacol Ther. 2019;106(1):36-9) describes the operational, technical and methodological challenges for the acceptability of real-world data for regulatory purposes and presents possible solutions to address these challenges. The FDA’s Real-World Evidence website also provides definitions and links to a set of useful guidelines on the submission and use of real-world data, including electronic health care databases, to support decision-making. The Joint ISPE-ISPOR Special Task Force Report on Good Practices for Real‐World Data Studies of Treatment and/or Comparative Effectiveness (2017) recommends good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER) and reviews methodological issues and possible solutions for CER studies based on secondary data analysis (see also Chapter 14.1 on comparative effectiveness research). Many of the principles are applicable to studies with other objectives than CER, but some aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.
The majority of the examples and methods covered in Chapter 5 are based on studies and methodologic developments in secondary use of healthcare databases, since this is one of the most frequent approaches used in pharmacoepidemiology. Several potential issues need to be considered in the use of electronic healthcare data for pharmacoepidemiological studies as they may affect the validity of the results. They include completeness of data capture, bias in the assessment of exposure, outcome and covariates, variability between data sources and the impact of changes over time in the data (as has been noted in the pre- vs. post-COVID-19 period), access methodology and the healthcare system of the country or region covered by the database.
A patient registry is an organised system that collects data and information on a group of patients defined by a particular disease, condition or exposure, and that serves a pre-determined scientific, clinical and/or public health (policy) purpose. A registry-based study is the investigation of a research question using the data collection infrastructure or patient population of one or more existing or new patient registries. A registry-based study may be a non-interventional trial/study or a clinical trial/study.
A patient registry should be considered as an infrastructure for the standardised recording of data from routine clinical practice on individual patients identified by a characteristic or an event, for example the diagnosis of a disease (disease registry), the occurrence of a condition (e.g., pregnancy registry), a birth defect (e.g. birth defect registry), a molecular or a genomic feature or any other patient characteristics, or an encounter with a particular healthcare service. The term product registry is sometimes used for a system where data are collected on patients exposed to a particular medicinal product, single substance or therapeutic class in order to evaluate their use or their effects, but such system should rather be considered a clinical trial or a non-interventional study as data is collected for a specific pre-planned analysis purpose in line with performing a trial/study and does not include specific aspects related to the use of patient registries as source population and/or existing data collection and analysis system.
As illustrated in Imposed registries within the European postmarketing surveillance system (Pharmacoepidemiol Drug Saf. 2018;27(7):823-26) and the EMA’s Draft Guideline on registry-based studies (2020) there are methodological differences between registries and registry-based studies.
Patient registries are often integrated into routine clinical practice with systematic and sometimes automated data capture in electronic healthcare records. A registry-based study may only use the data relevant for the specific study objectives and may need to be enriched with additional information on outcomes, lifestyle data, immunisation or mortality information obtained from linkage to existing databases such as national cancer registries, prescription databases or mortality records
To support better use of existing registries for the benefit-risk evaluation of medicines, the EU regulatory network developed the Patient registries initiative. As part of this initiative, the EMA organised several workshops on disease-specific registries. The reports of these workshops describe regulators’ expectation on common data elements to be collected and best practices on topics such as governance, data quality control, data sharing or reporting of safety data. The ENCePP Resource database of data sources is also used to support an inventory of existing disease registries.
The EMA’s Scientific Advice Working Party issued two Qualification Opinions for two registry platforms, the ECFSPR and the EBMT, with an evaluation of their potential use as data sources for registry-based studies. These opinions provide an indication of the key methodological components expected by regulators for using a disease registry for such studies.
The US Agency for Health Care Research and Quality (AHRQ) published a comprehensive document on ‘good registry practices’ entitled Registries for Evaluating Patient Outcomes: A User's Guide, 3rd Edition (2018), which provides methodological guidance on planning, design, implementation, analysis, interpretation and evaluation of the quality of a registry. There is a dedicated section for linkage of registries to other data sources. The https://eunethta.eu/parent/ developed Methodological guidelines and recommendations for efficient and rational governance of patient registries (2015) to facilitate interoperability and cross-border use of registries.
Results obtained from analyses of registry data may be affected by the same biases as those of studies described in Chapter 5 of this Guide. Factors that may influence the enrollment of patients in a registry may be numerous (including clinical, demographic and socio-economic factors) and difficult to predict and identify, potentially resulting in a biased sample of the patient population in case the recruitment has not been exhaustive. Bias may also be introduced by differential completeness of follow-up and data collection.
As illustrated in The randomized registry trial--the next disruptive technology in clinical research? (N Engl J Med. 2013; 369(17): 1579-81) and Registry-based randomized controlled trials: what are the advantages, challenges and areas for future research? (J Clin Epidemiol. 2016;80:16-24), the randomised registry-based trial may support enhanced generalisability of findings, rapid consecutive enrollment, and the potential completeness of follow-up for the reference population, when compared with conventional randomized effectiveness trials. Defining key design elements of registry-based randomised controlled trials: a scoping review (Trials 2020;21(1):552) concludes that the low cost, reduced administrative burden and enhanced external validity make registries an attractive research methodology to be used to address questions of public health importance, but the issues of data integrity, completeness, timeliness, validation and adjudication of endpoints need to be carefully addressed.
In European Nordic countries, a comprehensive registration of data for nearly all of the population allows linkage between government-administered patient registries that may include hospital encounters, diagnoses and procedures, such as the Norwegian Patient Registry, the Danish National Patient Registry or the Swedish National Patient Register. Review of 103 Swedish Healthcare Quality Registries (J Intern Med. 2015; 277(1): 94–136) describes healthcare quality registries focusing on specific disorders initiated in Sweden mostly by physicians with data on aspects of disease management, self-reported quality of life, lifestyle, and general health status, providing an important source for research
Special populations can be identified based on age (e.g., paediatric or elderly), pregnancy status, renal or hepatic function, race, or genetic differences. Some registries are focused on these particular populations. Examples of these are the birth registries in Nordic countries and registries for rare diseases. The European Platform on Rare Diseases Registration (EU RD Platform) serves as platform for information on registries for rare diseases and has developed a set of common data elements for the European Reference Network and other rare disease registries.
Pregnancy registries include pregnant women followed until the end of pregnancy and provide information on pregnancy outcomes. Besides the difficulties of recruitment and retention of pregnant women, specific challenges of using pregnancy registries for observational studies on adverse effects of vaccines administered during pregnancy include the identification of relevant control groups for comparisons and completeness of information on pregnancy outcomes as embryonic and early foetal loss are often not recognised or recorded and data on the gestational age at which these events occur are often missing. These studies may require linkage with data captured in birth defects registries, teratology information services or electronic health care records where mother-child linkage is possible. In addition, the likelihood of vaccination increases with gestational age whereas the likelihood of foetal death decreases. The EMA’s Draft Guideline on good pharmacovigilance practices. Product- or Population-Specific Considerations III: Pregnant and breastfeeding women (2019) provides methodological recommendations for use of a pregnancy registry for data collection in additional pharmacovigilance activities. The FDA’s Draft Postapproval Pregnancy Safety Studies Guidance for Industry (2019) include recommendations for designing a pregnancy registry with a description of research methods and elements to be addressed. The Systematic overview of data sources for drug safety in pregnancy research (2016) provides an inventory of pregnancy exposure registries and alternative data sources on safety of prenatal drug exposure and discusses their strengths and limitations. Example of population-based registers allowing to assess outcome of drug exposure during pregnancy are the European network of registries for the epidemiologic surveillance of congenital anomalies EUROCAT, and the pan-Nordic registries which record drug use during pregnancy as illustrated in Selective serotonin reuptake inhibitors and venlafaxine in early pregnancy and risk of birth defects: population based cohort study and sibling design (BMJ. 2015;350:h1798).
For paediatric populations, specific and detailed information as neonatal age (e.g. in days), pharmacokinetic parameters and organ maturation need to be considered and is usually missing from the classical datasources, therefore paediatric specific registries are important. The CHMP Guideline on Conduct of Pharmacovigilance for Medicines Used by the Paediatric Population (2005) provides further relevant information. An example of registry which focuses on paediatric patients is Pharmachild, which captures children with juvenile idiopathic arthritis undergoing treatment with methotrexate or biologic agents.
The article Patient Registries: An Underused Resource for Medicines Evaluation: Operational proposals for increasing the use of patient registries in regulatory assessments (Drug Saf. 2019;42(11):1343-51) proposes sets of measures to improve use of registries in relation to: (1) nature of the data collected and registry quality assurance processes; (2) registry governance, informed consent, data protection and sharing; and (3) stakeholder communication and planning of benefit-risk assessments. The EMA’s Draft Guideline on registry-based studies (2020) discusses the use of registries for conducting registry-based studies. The use of registries to support the post-authorisation collection of data on effectiveness and safety of medicinal products in the routine treatment of diseases is also discussed in the EMA Scientific guidance on post-authorisation efficacy studies (2016).
Incorporating data from clinical practice into the drug development process is a growing interest from health technology assessment (HTA) bodies and payers since reimbursement decisions can benefit from better estimation and prediction of effectiveness of treatments at the time of product launch. An example of where registries can provide clinical practice data is the building of predictive models that incorporate data from both RCTs and registries to generalise results observed in RCTs to a real-world setting. In this context, the EUnetHTA Joint Action 3 project has issued the Registry Evaluation and Quality Standards Tool (REQueST) aiming to guide the evaluation of registries for effective usage in HTA.
Spontaneous reports of adverse drug effects remain a cornerstone of pharmacovigilance and are collected from a variety of sources, including healthcare providers, national authorities, pharmaceutical companies, medical literature and more recently directly from patients. EudraVigilance is the European Union data processing network and management system for reporting and evaluating suspected adverse drug reactions (ADRs). Other major systems for collections of spontaneous reports are the FDA's Adverse Event Reporting System (FAERS) for adverse event, medication error and product quality complaints resulting in adverse events, and the WHO global database of individual case safety reports, VigiBase, that pools reports of adverse events and suspected ADRs from the members of the WHO programme for international drug monitoring. These systems deal with the electronic exchange of Individual Case Safety Reports (ICSRs), the early detection of possible safety signals and the continuous monitoring and evaluation of potential safety issues in relation to reported ADRs. Spontaneous case reports represent the first line of evidence and the majority of safety signals is still based on them as described in A description of signals during the first 18 months of the EMA pharmacovigilance risk assessment committee (Drug Saf. 2014;37(12):1059-66).
The strength of spontaneous reporting systems is that they cover all types of authorised drugs used in any setting (primary, secondary and specialised healthcare). In addition to this, the reporting systems are built to obtain information specifically on potential adverse drug reactions. The data collection concentrates on variables relevant to this objective and directing reporters towards careful coding and communication of the main aspects of an ADR. Finally, these systems are built to collect and make the information on suspected ADRs rapidly available for analysis, within days. The application of knowledge discovery in databases to post-marketing drug safety: example of the WHO database (Fundam Clin Pharmacol. 2008;22(2):127-40) describes known limitations of spontaneous ADR reporting systems, which can be grouped into four main categories: i) underreporting, embedded in the concept of voluntary reporting whereby known or unknown external factors may influence the reporting rate and data quality, but also influenced from the fact that not all ADRs might be recognised by the reporter as drug induced; ii) limitation in the clinical information reported, not allowing a satisfactory case evaluation and/or the identification of possible risk factors; iii) overreporting following extensive media coverage and public awareness, such that an increased number of cases with similar symptoms are reported (misclassification of diagnosis); iv) lack of collection of control information so that the amount of drug use is not known and there is no direct information on disease incidence. For the above reasons, it is advised that the cases underlying a potential safety signal from spontaneous reports should be verified from a clinical perspective and preferably supported by pharmacological information before further communication. Anecdotes that provide definitive evidence (BMJ. 2006;333(7581):1267-9) describes examples where this is not necessary, where strong and well documented spontaneous reports can be convincing to support the existence of a signal.
Another challenge in spontaneous report databases is the quality of the information provided and adherence to reporting rules; for this reasons comprehensive and multi-faceted quality activities are often an integral part of these systems (see Detailed guide regarding the EudraVigilance data management activities by the European Medicines Agency Rev 1 for an example). One aspect of the data quality activities regards report duplication. Duplicates are separate and unlinked records that refer to one and the same case of a suspected ADR and may mislead clinical assessment or distort statistical screening. They are generally detected by individual case review of all reports or by computerised duplicate detection algorithms. In Performance of probabilistic method to detect duplicate individual case safety reports (Drug Saf. 2014;37(4):249-58) a probabilistic method applied to VigiBase highlighted duplicates that had been missed by a rule-based method and also improved the accuracy of manual review. In the study, however, a demonstration of the performance of de-duplication methods to improve signal detection is lacking. The EMA and FDA have also implemented probabilistic duplicate detection in their databases. A novel feature is an attempt to use narrative text analysed via NLP methods as demonstrated in Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems (Drug Saf. 2017;40(7):571–82).
Patient reporting is an important source of suspected adverse reactions that can be reported directly through various methods such as online patient reporting forms hosted by national competent authorities or using a phone. Factors affecting patient reporting of adverse drug reactions: a systematic review (Br J Clin Pharmacol. 2017;83(4):875-83) describes the practical difficulties with patient reporting and highlights the patients’ motivation to make their ADRs known to prevent similar suffering in other patients. The value of patient reporting to the pharmacovigilance system: a systematic review (Br J Clin Pharmacol. 2017;83(2):227-46) concludes that patient reporting adds new information and perspective about ADRs in a way otherwise unavailable, and this can contribute to better decision-making processes in regulatory activities. Patient Reporting in the EU: Analysis of EudraVigilance Data (Drug Saf. 2017;40(7):629-45) also concludes that patient reporting complements reporting by health care professionals and that patients were motivated to report especially those ADRs that affected their quality of life.
The study Vaccine side-effects and SARS-CoV-2 infection after vaccination in users of the COVID Symptom Study app in the UK: a prospective observational study (Lancet Infect Dis. 2021;S1473) used an app to examine the proportion and probability of self-reported systemic and local side-effects within 8 days of vaccination in individuals who received one or two doses of the BNT162b2 vaccine or one dose of the ChAdOx1 nCoV-19 vaccine. These data do not represent spontaneous reports from a regulatory perspective, but the authors discuss that such self-reported data can introduce information bias, some participants might be more likely to report symptoms than others and there is the potential for users to drop out of reporting in the app. However, use of an app allowed to recruit a large sample size for the study.
The information collected in spontaneous reports is a reflection of a clinical event that has been attributed to the use of one or more suspected drugs. Although the majority of information provided in the ICSRs is coded, the description of the clinical event, as well as the interpretation of the reporter, contains valuable information for signal detection purposes. Examples are the description of timing and course of the reactions, of the presence or absence of additional risk factors and of the medical history of the patient involved. Since only part of this information is coded and can be used in statistical analysis, it remains important to review the underlying cases at all times for signal detection purposes. Knowledge of the local healthcare system, its corresponding guidelines and the possibilities to follow up for more detailed information are considered important during this review.
The increase in systematic collection of ICSRs in large electronic databases has allowed the application of data mining and statistical techniques for the detection of safety signals (see chapter 9). Validation of statistical signal detection procedures in EudraVigilance post-authorisation data: a retrospective evaluation of the potential for earlier signalling (Drug Saf. 2010;33(6): 475-87) has shown that the statistical methods applied in EudraVigilance can provide significantly early warning in a large proportion of drug safety problems. Nonetheless, this approach should supplement, rather than replace, other pharmacovigilance methods.
The report Characterization of databases (DB) used for signal detection (SD) shows the heterogeneity of spontaneous databases and the lack of comparability of signal detection methods employed.
Chapters IV and V of the Report of the CIOMS Working Group VIII ‘Practical aspects of Signal detection in Pharmacovigilance’ present sources and limitations of spontaneously-reported drug-safety information and databases that support signal detection. Appendix 3 of the report provides a list of international and national spontaneous reporting system databases.
Finally, in EudraVigilance Medicines Safety Database: Publicly Accessible Data for Research and Public Health Protection (Drug Saf. 2018;41(7):665-75), the authors describe how these databases, focusing on EudraVigilance, have been made more easily accessible for external stakeholders. This has allowed to provide better access to information on suspected adverse reactions for healthcare professionals and patients, and opportunities for health research for academic institutions.
Technological advances have dramatically increased the range of data sources that can be used to complement traditional ones and may provide compelling insights into or relevant to effectiveness and safety of health interventions such as medicines and their risk minimisation measures, benefit-risk communications and related stakeholder engagement. Such data include those from digital media that exist in a computer-readable format and can be extracted from websites, web pages, blogs, vlogs, social networking sites, internet forums, chat rooms and health portals. A recent addition to the digital media data is biomedical data collected through wearable technology (e.g., heart rate, physical activity and sleep pattern, dietary patterns). These data are unsolicited and generated in real time.
A subset of digital media data are social media data. The European Commission’s Digital Single Market Glossary defines social media as “a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0 and that allow the creation and exchange of user-generated content. It employs mobile and web-based technologies to create highly interactive platforms via which individuals and communities share, co-create, discuss, and modify user-generated content.”
Social media content analyses have been used to provide insights into patients’ perceptions of the effectiveness and safety of medicines and for the collection of patient reported outcomes, as discussed in Web-based patient-reported outcomes in drug safety and risk management: challenges and opportunities? (Drug Saf. 2012;35(6):437-46).
The IMI WEB-RADR European collaborative project explored different aspects related to the use of social media data for pharmacovigilance and summarised its recommendations in Recommendations for the Use of Social Media in Pharmacovigilance: Lessons From IMI WEB-RADR (Drug Saf 2019;42(12):1393-407). The French Vigi4Med project, which evaluated the use of social media, mainly web forums, for pharmacovigilance activities, published a set of recommendations in Use of Social Media for Pharmacovigilance Activities: Key Findings and Recommendations from the Vigi4Med Project (Drug Saf. 2020;43(9):835-51).
A further possible use of social media data would be as a source of information for signal detection or assessment. Studies including Using Social Media Data in Routine Pharmacovigilance: A Pilot Study to Identify Safety Signals and Patient Perspectives (Pharm Med. 2017;31(3): 167-74) and Assessment of the Utility of Social Media for Broad-Ranging Statistical Signal Detection in Pharmacovigilance: Results from the WEB-RADR Project (Drug Saf. 2018;41(12):1355–69) evaluated whether analysis of social media data (specifically Facebook and Twitter posts) could identify pharmacovigilance signals early, but in their respective settings, found that this was not the case.
The study Using Social Media Data in Routine Pharmacovigilance: A Pilot Study to Identify Safety Signals and Patient Perspectives (Pharm Med. 2017;31(3): 167-74) also tried to determine the quantity of posts with resemblance to adverse events and the types and characteristics of products that would benefit from social media content analysis. It concludes that, although analysis of data from social media did not identify new safety signals, it can provide unique insight into the patient perspective.
From a regulatory perspective, social media is a source of potential reports of suspected adverse reactions and marketing authorisation holders are legally obliged to screen websites under their management and assess whether reports of adverse reactions qualify for spontaneous reporting (see Good Pharmacovigilance Practice Module VI, section VI.B.1.1.4.). Principles for continuous monitoring of the safety of medicines without overburdening established pharmacovigilance systems and a regulatory framework on the use of social media in pharmacovigilance have been proposed in Establishing a Framework for the Use of Social Media in Pharmacovigilance in Europe (Drug Saf. 2019;42(8):921-30).
Sentiment analyses of social media content may offer future opportunities for regulators into public perceptions about the safety of medicines and trustworthiness of regulatory bodies. This can inform and evaluate specific safety communication strategies aiming at effective and safe use of medicines. For example, a recent study provided insight into public sentiments about vaccination of pregnant women by stance, discourse and topic analysis of social media posts in ‘‘Vaccines for pregnant women?! Absurd” – Mapping maternal vaccination discourse and stance on social media over six months (Vaccine 2020;38(42): 6627-38).
While offering the promise of new research models and approaches, the rapidly evolving social media environment presents many challenges including the need for strong and systematic processes for data selection and validation, and study implementation. Articles which detail associated challenges are: Evaluating Social Media Networks in Medicines Safety Surveillance: Two Case Studies (Drug Saf. 2015;38(10): 921-30.) and Social media and pharmacovigilance: A review of the opportunities and challenges (Br J Clin Pharmacol. 2015;80(4): 910-20).
There is currently no defined strategy or framework in place in order to meet the standards around data selection and validity and methods for data analysis, and their regulatory acceptance may therefore be lower than for traditional sources. However, more tools and methods for analysing unstructured data are becoming available, especially for pharmacoepidemiology and pharmacovigilance research, as in Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts (J Am Med Inform Assoc. 2017 Feb 22), Social Media Listening for Routine Post-Marketing Safety Surveillance (Drug Saf. 2016;39(5):443-54) and Social Media Research (Chapter 11 in Communicating about Risks and Safe Use of Medicines, Adis Singapore, 2020, pp 307-332). However, the recognition and disambiguation of references to medicines and adverse events in free text remains a challenge and performance evaluations need to be critically assessed as discussed in Prospective Evaluation of Adverse Event Recognition Systems in Twitter: Results from the Web-RADR Project (Drug Saf. 2020;43(8):797-808).
The EU General Data Protection Regulation (GDPR) introduces EU-wide legislation on personal data and security. It specifies that the impact of data protection at the time of study design concept should be assessed and reviewed periodically. Other technical documents may also be applicable such as Smartphone Secure Development Guidelines (2017) published by the European Network and Information Security Agency (ENISA), which advises on design and technical solutions. The principles of these security measures are found in the European Data Protection Supervisor (EDPS) opinion on mobile health (Opinion 1/2015 Mobile Health-Reconciling technological innovation with data protection).
Pooling data across different databases affords insight into the generalisability of the results and may improve precision. A growing number of studies use data from networks of databases, often from different countries. Some of these networks are based on long-term contracts with selected partners and are very well structured (such as Sentinel, the Vaccine Safety Datalink (VSD), or the Canadian Network for Observational Drug Effect Studies (CNODES)), while others are looser collaborations based on an open community principle (e.g. Observational Health Data Sciences and Informatics (OHDSI)). In Europe, collaborations for multi-database studies have been strongly encouraged by the drug safety research funded by the European Commission (EC) and public-private partnerships such as the Innovative Medicines Initiative (IMI). This funding resulted in the conduct of groundwork necessary to overcome the hurdles of data sharing across countries for specific projects (e.g. PROTECT, ADVANCE, EMIF, EHDEN, ConcePTION) or for specific post-authorisation studies.
The 2009 H1N1 influenza pandemic (see Safety monitoring of Influenza A/H1N1 pandemic vaccines in EudraVigilance, Vaccine 2011;29(26):4378-87) and more recently, the 2020 COVID-19 pandemic showed the importance of a formal established infrastructure that can rapidly and effectively monitor the safety of treatments and vaccines. In this context, EMA has established contracts with academic and private partners to support readiness of research networks to perform observational research. Three dedicated projects started in 2020: ACCESS (vACcine Covid-19 monitoring readinESS), CONSIGN (‘COVID-19 infectiOn aNd medicineS In preGNancy’) and E-CORE (Evidence for COVID-19 Observational Research Europe). Other initiatives have emerged to address specific COVID-19 related research questions, such as the CVD-COVID-UK consortium (Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource, BMJ. 2021;373:n826), providing a secure access to linked health data from primary and secondary care, registered deaths, COVID-19 laboratory and vaccination data, and cardiovascular specialist audits and covering almost the entire population of England (>54 million people); similar linked data have been made available in trusted research environments for Scotland and Wales (>8 million people).
In this chapter, the term networking is used to reflect collaboration between investigators for sharing expertise and resources. The ENCePP Database of Research Resources may facilitate such networking by providing an inventory of research centres and data sources that can collaborate on specific pharmacoepidemiology and pharmacovigilance studies in Europe. It allows the identification of centres and datasets by country, type of research and other relevant fields.
The use of research networks in drug safety research is well established and a significant body of practical experience exists. By contrast, no consensus exists on the use of such networks, or indeed of single sources of observational data, in estimating the effectiveness of medicinal products. In particular, the use in support of licensing applications will require evaluations of the reliability of results and the verifiability of research processes that are currently at an early stage. Specific advice on effectiveness can only be given once this work has been done and incorporated into regulatory guidelines. Hence this discussion currently relates only to product safety (see Assessing strength of evidence for regulatory decision making in licensing: What proof do we need for observational studies of effectiveness?, Pharmacoepidemiol. Drug Saf. 2020;29(10):1336-40).
From a methodological point of view, research networks that adopt a multi-database design have many advantages over single database studies:
The articles Approaches for combining primary care electronic health record data from multiple sources: a systematic review of observational studies (BMJ Open 2020;10(10): e037405) and Different strategies to execute multi-database studies for medicines surveillance in real world setting: a reflection on the European model (Clin Pharmacol Ther. 2020;108(2):228-35) describe key characteristics of studies using multiple data sources and different models applied for combining data or results from multiple databases. A common characteristic of all models is the fact that data partners maintain physical and operational control over electronic data in their existing environment and therefore the data extraction is always done locally. Differences however exist in the following areas: use of a common protocol; use of a common data model (CDM); and where and how the data analysis is conducted.
Use of a CDM implies that local formats are translated into a predefined, common data structure, which allows launching a similar data extraction and analysis script across several databases. Sometimes the CDM imposes a common terminology as well, as in the case of the OMOP CDM. The CDM can be systematically applied on the entire database (generalised CDM) or on the subset of data needed for a specific study (study specific CDM). The CDM is assumed to faithfully represent the source data both in term of completeness and accuracy. Validation studies such as Can We Rely on Results From IQVIA Medical Research Data UK Converted to the Observational Medical Outcome Partnership Common Data Model? A Validation Study Based on Prescribing Codeine in Children (Clin Pharmacol Ther. 2020;107(4): 915-25) are recommended and any deviations that might be found should be carefully monitored and recorded.
In the European Union, study specific CDMs have generated results in several projects and initial steps have been taken to create generalised CDMs that received an acceleration during the last year thanks also to the role that observational research had in informing the response to the pandemic. An example of application of generalised CDMs are the studies conducted in the OHDSI community such as Association of angiotensin converting enzyme (ACE) inhibitors and angiotensin II receptor blockers (ARB) on coronavirus disease (COVID-19) incidence and complications.
Five models of studies are presented, classified according to specific choices in the steps needed to execute a study: protocol development and agreement (whether separate or common); where the data are extracted and analysed (locally or centrally); how the data are extracted and analysed (using individual or common programs); and use of a CDM and which type (study specific or general). The key characteristics of the steps needed to execute each study model are presented in the following Figure and explained in this chapter:
188.8.131.52. Meta-analysis: separate protocols, local and individual data extraction and analysis, no CDM
The traditional model to combine data from multiple data sources happens when data extraction and analysis are performed independently at each centre based on separate protocols. This is usually followed by meta-analysis of the different estimates obtained (see Chapter 8 and Annex 1).
This type of model, when viewed as a prospective decision to combine results from multiple data sources on the same topic, may be considered as a baseline situation which a research network will try to improve. Moreover, since meta-analyses facilitate the evaluation of heterogeneity of results across different independent studies, it should be used retrospectively regardless of the model of studies used. If all the data sources can be accessed, explaining such variation should also be attempted.
This is coherent with the recommendations from Multi-centre, multi-database studies with common protocols: lessons learnt from the IMI PROTECT project (Pharmacoepidemiol Drug Saf. 2016;25(S1):156-165), stating that investigating heterogeneity may provide useful information on the issue under investigation. This approach eventually increases consistency in findings from observational drug effect studies or reveals causes of differential drug effects.
184.108.40.206. Local analysis: common protocol, local and individual data extraction and analysis, no CDM
In this model, data are extracted and analysed locally, with site-specific programs that are developed by each centre, on the basis of a common protocol agreed by study partners that defines and standardise exposures, outcomes and covariates, analytical programmes and reporting formats. The results of each analysis, either at a patient level or in an aggregated format depending on the governance of the network, are shared and can be pooled together through meta-analysis.
This approach allows assessment of database or population characteristics and their impact on estimates but reduces variability of results determined by differences in design. Examples of research networks that use the common protocol approach are PROTECT (as described in Improving Consistency and Understanding of Discrepancies of Findings from Pharmacoepidemiological Studies: the IMI PROTECT Project, Pharmacoepidemiol Drug Saf. 2016;25(S1): 1-165).
This approach requires very detailed common protocols and data specifications that reduce variability in interpretations by researchers.
220.127.116.11. Sharing of data: common protocol, local and individual data extraction, central analysis
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted with site-specific programs, transferred without analysis and conversion to a CDM, and pooled and analysed at the central partner receiving them. Data received at the central partner can be reformatted to a common structure to facilitate the analysis.
Examples for this approach are when databases are very similar in structure and content, as is the case for some Nordic registries, or the Italian regional databases. Examples of such models are Risks and benefits of psychotropic medication in pregnancy: cohort studies based on UK electronic primary care health records (Health Technol Assess. 2016;20(23):1–176) and All‐cause mortality and antipsychotic use among elderly persons with high baseline cardiovascular and cerebrovascular risk: a multi‐center retrospective cohort study in Italy (Expert Opin. Drug Metab. Toxicol. 2019;15(2):179-88).
The central analysis allows for assessment of pooled data adjusting for covariates on an individual patient level and removing an additional source of variability linked to the statistical programing and analysis. However, this model becomes more difficult to implement, especially in Europe, due to the stronger privacy requirements when sharing patient level data.
18.104.22.168. Study specific CDM: common protocol, local and individual data extraction, local and common analysis, study specific CDM
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted and transformed into an agreed CDM; data in the CDM are then processed locally in all the sites with one common program. The output of the common program is transferred to a specific partner. The output to be shared may be an analytical dataset or study estimates, depending on the governance of the network.
Examples of research networks that used this approach by employing a study-specific CDM with transmission of anonymised patient-level data (allowing a detailed characterisation of each database) are EU-ADR (as explained in Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?, J Intern Med 2014;275(6):551-61), SOS, ARITMO, SAFEGUARD, GRIP, EMIF, EUROmediCAT, ADVANCE, VAC4EU and ConcePTION. In all these projects, a CDM was utilised and R, SAS, STATA or Jerboa scripts have been used to create and share common analytics. Diagnosis codes for case finding can be mapped across terminologies by using the Codemapper, developed in the ADVANCE project, as explained in CodeMapper: semiautomatic coding of case definitions (Pharmacoepidemiol Drug Saf. 2017;26(8):998-1005).
An example of a study performed using this model is Background rates of Adverse Events of Special Interest for monitoring COVID-19 vaccines, an ACCESS study.
22.214.171.124. General CDM: common protocol, local and common data extraction and analysis, general CDM
In this approach, the local databases are transformed into a CDM prior to and independent of any study protocol. When a study is required, a common protocol is developed and a centrally created analysis program is created that runs locally on each database to extract and analyse the data. The output of the common programs shared may be an analytical dataset or study estimates, depending on the governance of the network.
Three examples of research networks which use a generalised CDM are the Sentinel Initiative (as described in The U.S. Food and Drug Administration's Mini-Sentinel Program, Pharmacoepidemiol Drug Saf 2012;21(S1):1–303), OHDSI – Observational Health Data Sciences and Informatics and the Canadian Network for Observational Drug Effect Studies (CNODES). The latter was relying on the second model proposed in this chapter, but it has been converted to a CDM, with six provinces having already completed the transformation of their data, as explained in Building a framework for the evaluation of knowledge translation for the Canadian Network for Observational Drug Effect Studies (Pharmacoepidemiol. Drug Saf. 2020;29 (S1),8-25).
The main advantage of a general CDM is that it can be used for virtually any study involving that database. OHDSI is based on the Observational Medical Outcomes Partnership (OMOP) CDM which is now used by many organisations and has been tested for its suitability for safety studies (see for example Validation of a common data model for active safety surveillance research, J Am Med Inform Assoc. 2012;19(1):54–60, and Can We Rely on Results From IQVIA Medical Research Data UK Converted to the Observational Medical Outcome Partnership Common Data Model?: A Validation Study Based on Prescribing Codeine in Children (Clin Pharmacol Ther. 2020;107(4):915-25)). Conversion into the OMOP CDM requires formal mapping of database items to standardised concepts. This is resource intensive and will need to be updated every time the databases are refreshed. Examples of studies performed with the OMOP CDM in Europe are the Large-scale evidence generation and evaluation across a network of databases (LEGEND): assessing validity using hypertension as a case study (J Am Med Inform Assoc. 2020;27(8):1268-77) and Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study (Lancet Rheumatol. 2020;2: e698–711).
In A Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance (Drug Saf. 2015;38(8):749-65), it is suggested that slight conceptual differences between the Sentinel and the OMOP models do not significantly impact on identifying known safety associations. Differences in risk estimations can be primarily attributed to the choices and implementation of the analytic approach.
The different models described above present several challenges:
Related to the databases content:
Related to the organisation of the network:
Each model has strengths and weaknesses in facing the above challenges, as illustrated in Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies (eGEMs 2016;4(1):2). In particular, a central analysis or a CDM provide protection from problems related to variation in how protocols are implemented as individual analysts might implement protocols differently (as described in Quantifying how small variations in design elements affect risk in an incident cohort study in claims; Pharmacoepidemiol Drug Saf. 2020;29(1):84-93). Experience has shown that many of these difficulties can be overcome by full involvement and good communication between partners, and a clear governance model defining roles, responsibilities and addressing issues of intellectual property and authorship. Several of the networks have made their code, products data models and analytics software publicly available, such as OHDSI, Sentinel, ADVANCE/VAC4EU.
Timeliness or speed for running studies is important in order to meet short regulatory timelines in circumstances where prompt decisions are needed. Solutions need therefore to be further developed and introduced to be able to run multi-database studies with shorter timelines. Independently from the model used, a critical factor that should be considered in speeding up studies relates to having tasks completed that are independent of any particular study. This includes all activities associated with governance, such as having prespecified agreements on data access, processes for protocol development and study management, and identification and characterisation of a large set of databases. This also includes some activities related to the analysis, such has creating common definitions for frequently used variables, and creating common analytical systems for the most typical and routine analyses (this latter point is made easier with the use of CDMs with standardised analytics and tools that can be re-used to support faster analysis).