Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates


Historically, pharmacoepidemiology studies relied on patient-supplied information or searches through paper-based health records. This reliance has been reduced with the rapid expansion of access to electronic healthcare records and existence of large administrative databases. Nevertheless, these data sources have led to variation in the way exposures and outcomes are defined and measured, each requiring validation. Chapter 41 of Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) includes a literature review of the studies that have evaluated the validity of drug, diagnosis and hospitalisation data and the factors that influence the accuracy of these data. The book presents information on primary data sources available for pharmacoepidemiology studies including questionnaires and administrative databases. Further information on databases available for pharmacoepidemiology studies is available in resources such as the ENCePP resource database and the Inventory of Drug Consumption Databases in Europe.


5.1.1. Assessment of exposure


In pharmacoepidemiology studies, exposure data originate mainly from four sources: data on prescribing (e.g. CPRD primary care data), data on dispensing (e.g. PHARMO outpatient pharmacy database), data on payment for medication (namely claims data, e.g. IMS LifeLink PharMetrics Plus) or from data collected from surveys. The population included in these data sources follows a process of attrition: drugs that are prescribed are not necessarily dispensed, and drugs that are dispensed are not necessarily ingested. In Primary non-adherence in general practice: a Danish register study (Eur J Clin Pharmacol 2014;70(6):757–63), 9.3% of all prescriptions for new therapies were never redeemed at the pharmacy, although with some differences between therapeutic and patient groups. The attrition from dispensing to ingestion is even more difficult to measure, as it involves uncertainties about what dispensed drugs are actually taken by the patients and about the patients’ ability to account accurately for their intake. In particular, paediatric adherence is additionally dependent on parents.


Exposure definitions can include simple dichotomous variables (e.g. ever exposed vs. never exposed) or they can be more detailed, including estimates of exposure windows (e.g. current vs. past exposure) or levels of exposure (e.g. current dosage, cumulative dosage over time). Consideration should be given to the level of detail available from the data sources on the timing of exposure, including the quantity prescribed, dispensed or ingested and the capture of dosage instructions when evaluating the feasibility of constructing such variables. This will vary across data sources and exposures (e.g. estimating contraceptive pill ingestion is typically easier than estimating rescue medication for asthma attacks). Discussions with clinicians regarding sensible assumptions will inform variable definition.


The Methodology chapter of the book Drug Utilization Research. Methods and Applications (M. Elseviers, B. Wettermark, A.B. Almarsdottir et al. Ed. Wiley Blackwell, 2016) discusses different methods for data collection on drug utilisation.


5.1.2. Assessment of outcomes

A case definition compatible with the observational database should be developed for each outcome of a study at the design stage. This description should include how events will be identified and classified as cases, whether cases will include prevalent as well as incident cases, exacerbations and second episodes (as differentiated from repeat codes) and all other inclusion or exclusion criteria. The reason for the data collection and the nature of the healthcare system that generated the data should also be described as they can impact on the quality of the available information and the presence of potential biases. Published case definitions of outcomes, such as those developed by the Brighton Collaboration in the context of vaccinations, are not necessarily compatible with the information available in a given observational data set. For example, information on the duration of symptoms may not be available, or additional codes may have been added to the data set following publication of the outcome definition.


Search criteria to identify outcomes should be defined and the list of codes should be provided. Generation of code lists requires expertise in both the coding system and the disease area. Researchers should also consult clinicians who are familiar with the coding practice within the studied field. Suggested methodologies are available for some coding systems (see Creating medical and drug code lists to identify cases in primary care databases. Pharmacoepidemiol Drug Saf 2009;18(8):704-7). Coding systems used in some commonly used databases are updated regularly so sustainability issues in prospective studies should be addressed at the protocol stage. Moreover, great care should be given when re-using a code list from another study as code lists depend on the study objective and methods. Public repository of codes as is available and researchers are also encouraged to make their own set of coding available.


In some circumstances, chart review or text entries in electronic format linked to coded entries can be useful for outcome identification. Such identification may involve an algorithm with use of multiple code lists (for example disease plus therapy codes) or an endpoint committee to adjudicate available information against a case definition. In some cases, initial plausibility checks or subsequent medical chart review will be necessary. When databases have prescription data only, drug exposure may be used as a proxy for an outcome, or linkage to different databases is required.


5.1.3. Assessment of covariates


In pharmacoepidemiology studies, covariates are often used for selecting and matching study subjects, comparing characteristics of the cohorts, developing propensity scores, creating stratification variables, evaluating effect modifiers and adjusting for confounders. Reliable assessment of covariates is therefore essential for the validity of results. Patient characteristics and other key covariates that could be confounding variables need to be evaluated using all available data. A given database may or may not be suitable for studying a research question depending on the availability of these covariates.

Some patient characteristics and covariates vary with time and accurate assessment is time dependent. The timing of assessment of the covariates is an important factor for the correct classification of the subjects and should be clearly specified in the protocol. Assessment of covariates can be done using different periods of time (look-back periods or run-in periods).


Fixed look-back periods (for example 6 months or 1 year) are sometimes used when there are changes in coding methods or in practices or when is not feasible to use the entire medical history of a patient. Estimation using all available covariates information versus a fixed look-back window for dichotomous covariates (Pharmacoepidemiol Drug Saf. 2013; 22(5):542-50) establishes that defining covariates based on all available historical data, rather than on data observed over a commonly shared fixed historical window will result in estimates with less bias. However, this approach may not be applicable when data from paediatric and adult periods are combined because covariates may significantly differ between paediatric and adult populations (e.g., height and weight).


5.1.4. Validation


In healthcare databases, the correct assessment of drug exposure, outcome and covariate is crucial to the validity of research. The validation of electronic information on drug exposure, outcome or covariate is crucial for database studies and definitions should be included in the technical handbook of every database, ideally providing estimates of sensitivity, specificity, and the positive and negative predictive value. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract 2010;60:e128-36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1:82-9) contain examples.


Completeness and validity of all variables used as exposure, outcomes, potential confounders and effect modifiers should be considered. Assumptions included in case definitions or other algorithms may need to be confirmed. For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of previous validation should, however, consider the effect of any differences in variables or analyses and subsequent changes to health care, procedures and coding. A full understanding of both the health care system and procedures that generated the data is required. This is particularly important for studies relying upon accurate timing of exposure, outcome and covariate recording such as in the self-controlled case series.  External validation against chart review or physician/patient questionnaire is possible with some resources. However, the questionnaires cannot always be considered as ‘gold standard’.


Review of records against a case definition by experts may also be possible. While false positives are more easily measured than false negatives, specificity of an outcome is more important than sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol 2005;58(4):323-37). Alternatively, internal logic checks can test for completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures.


Concordance between datasets such as comparison of cancer or death registries with clinical or administrative records can validate individual records or overall incidence or prevalence rates.

Linkage validation can be used as well, when another database is used for the validation of current one, through linkage methods (Using linked electronic data to validate algorithms for health outcomes in administrative databases., J Comp Eff Res. 2015 Aug;4(4):359-66.)


5.2. Bias and confounding


5.2.1. Selection bias


Selection bias entails the selective recruitment into the study of subjects that are not representative of the exposure or outcome pattern in the source population. Examples of selection bias are referral bias, self-selection bias, prevalence bias or protopathic bias (Strom BL, Kimmel SE, Hennessy S. Pharmacoepidemiology, 5th Edition, Wiley, 2012).


Protopathic bias


Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias thus reflects a reversal of cause and effect (Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65:2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies. It may be handled by including a time-lag, i.e. by disregarding all exposure during a specified period of time before the index date.


Prevalence bias


The practice of including prevalent users in observational studies, i.e. patients taking a therapy for some time before study follow-up began, can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if risk varies with time, as seen in the association between contraceptive intake and venous thrombosis which was initially overestimated due to the heathy-users bias. (The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update. 1999 Nov-Dec;5(6)). Secondly, covariates for drug users at study entry are often plausibly affected by the drug itself.


5.2.2. Information bias


Information bias arises when incorrect information about either exposure or outcome or any covariates is collected in the study. It can be either non-differential when it does occur randomly across exposed/non-exposed participants or differential when it is influenced by the disease or exposure status.

Non differential misclassification bias drives the risk estimate towards the null value, while differential bias can drive the risk estimate in either direction. Examples of non-differential misclassification bias are recall bias (e.g., in case controls studies cases and controls can have different recall of their past exposures) and surveillance or detection bias.


Surveillance bias (or detection bias)


Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or an associated symptom. For example, post-menopausal exposure to estrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between estrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).


This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. The issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA, 2011;305(23):2462-3)).


Time-related bias


Time-related bias is most often a form of differential misclassification bias and is triggered by inappropriate accounting of follow-up time and exposure status in the study design and analysis. 


The choice of the exposure risk window can influence risk comparisons due to misclassification of drug exposure possibly associated with risks that vary over time. A study of the effects of exposure misclassification due to the time-window design in pharmacoepidemiologic studies (Clin Epidemiol 1994:47(2):183–89) considers the impact of the time-window design on the validity of risk estimates in record linkage studies. In adverse drug reaction studies, an exposure risk-window constitutes the number of exposure days assigned to each prescription. The ideal design situation would occur when each exposure risk-window would only cover the period of potential excess risk. The estimation of the time of drug-related risk is however complex as it depends on the duration of drug use, timing of ingestion and the onset and persistence of drug toxicity. With longer windows, a substantive attenuation of incidence rates may be observed. Risk windows should be validated or sensitivity analyses should be conducted.


Immortal time bias


Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur. (K. Rothman, S. Greenland, T. Lash. Pharmacoepidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008 p. 106-7).


Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf 2007;16:241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this bias.


Immortal time bias in Pharmacoepidemiology (Am J Epidemiol 2008;167:492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions and illustrates it with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time.


Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162:1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects were excluded from the analysis and the study allowed effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ 2010; 340:b5087) describes how immortal time in observation studies can bias the results in favour of the treatment group and how they can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias on pharmacoepidemiology’ (Am J Epidemiol 2009; 170: 667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.


Other forms of time-related bias


Time-window Bias in Case-control Studies. Statins and Lung Cancer (Epidemiology 2011; 22 (2):228-31) describes a case-control study which reported a 45% reduction in the rate of lung cancer with any statin use. A differential misclassification bias arose from the methods used to select controls and measure their exposure, which resulted in exposure assessment to statins being based on a shorter time-span for cases than controls and an over-representation of unexposed cases. Properly accounting for time produced a null association.


In many database studies, exposure status during hospitalisations is unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure or not, especially during hospitalisation when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ described in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol 2008;168 (3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.


In the case of case control studies assessing chronic diseases with multiple hospitalizations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias: Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol 2008;168 (3):329-35).


In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g. diabetes), which can induce confounding of the association with an outcome (e.g. cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).


5.2.3. Confounding

Confounding occurs when the estimate of measure of association is distorted by the presence of another risk factor. For a variable to be a confounder, it must be associated with both the exposure and the outcome, without being in the causal pathway.


Confounding by indication


Confounding by indication refers to a determinant of the outcome parameter that is present in people at perceived high risk or poor prognosis and is an indication for intervention. This means that differences in care, for example, between cases and controls may partly originate from differences in indication for medical intervention such as the presence of risk factors for particular health problems. Other names for this type of confounding are ‘channelling’ or ‘confounding by severity’.


This type of confounding has frequently been reported in studies evaluating the efficacy of pharmaceutical interventions and is almost always encountered in various extents in pharmacoepidemiological studies. A good example can be found in Confounding and indication for treatment in evaluation of drug treatment for hypertension (BMJ 1997;315:1151-4).


The article Confounding by indication: the case of the calcium channel blockers (Pharmacoepidemiol Drug Saf 2000;9:37-41) demonstrates that studies with potential confounding by indication can benefit from appropriate analytic methods, including separating the effects of a drug taken at different times, sensitivity analysis for unmeasured confounders, instrumental variables and G-estimation.


With the more recent application of pharmacoepidemiological methods to assess effectiveness, confounding by indication is a greater challenge and the article Approaches to combat with confounding by indication in observational studies of intended drug effects (Pharmacoepidemiol Drug Saf 2003;12:551-8) focusses on its possible reduction in studies of intended effects. An extensive review of these and other methodological approaches discussing their strengths and limitations is discussed in Methods to assess intended effects of drug treatment in observational studies are reviewed (J Clin Epidemiol 2004;57:1223-31).


Unmeasured confounding


Complete adjustment for confounders would require detailed information on clinical parameters, lifestyle or over-the-counter medications, which are often not measured in electronic healthcare records, causing residual confounding bias. Using directed acyclic graphs to detect limitations of traditional regression in longitudinal studies (Int J Public Health 2010;55:701-3) reviews confounding and intermediate effects in longitudinal data and introduces causal graphs to understand the relationships between the variables in an epidemiological study.


Unmeasured confounding can be adjusted for only through randomisation. When this is not possible, as most often in pharmacoepidemiological studies, the potential impact of residual confounding on the results should be estimated and considered in the discussion.


Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics (Pharmacoepidemiol Drug Saf 2006;15(5):291-303) provides a systematic approach to sensitivity analyses to investigate the impact of residual confounding in pharmacoepidemiological studies that use healthcare utilisation databases. In the article, four basic approaches to sensitivity analysis were identified: (1) sensitivity analyses based on an array of informed assumptions; (2) analyses to identify the strength of residual confounding that would be necessary to explain an observed drug-outcome association; (3) external adjustment of a drug-outcome association given additional information on single binary confounders from survey data using algebraic solutions; (4) external adjustment considering the joint distribution of multiple confounders of any distribution from external sources of information using propensity score calibration. The paper concludes that sensitivity analyses and external adjustments can improve our understanding of the effects of drugs in epidemiological database studies. With the availability of easy-to-apply spreadsheets (e.g. at, sensitivity analyses should be used more frequently, substituting qualitative discussions of residual confounding.


The amount of bias in exposure-effect estimates that can plausibly occur due to residual or unmeasured confounding has been debated. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study (Am J Epidemiol 2007;166:646–55) considers the extent and patterns of bias in estimates of exposure-outcome associations that can result from residual or unmeasured confounding, when there is no true association between the exposure and the outcome. With plausible assumptions about residual and unmeasured confounding, effect sizes of the magnitude frequently reported in observational epidemiological studies can be generated. This study also highlights the need to perform sensitivity analyses to assess whether unmeasured and residual confounding are likely problems. Another important finding of this study was that when confounding factors (measured or unmeasured) are interrelated (e.g. in situations of confounding by indication), adjustment for a few factors can almost completely eliminate confounding.


5.3. Methods to handle bias and confounding


5.3.1. New-user designs


New user (incident user) designs can avoid prevalence bias by restricting the analysis to persons under observation at the start of the current course of treatment, therefore with the same baseline risk. Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol 2003;158 (9):915–20).  In addition to defining new-user designs, the article explains how they can be implemented as case-control studies and describes the logistical and sample size limitations involved.


5.3.2. Case-only designs


Case-only designs reduce confounding by using the exposure history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, as demographics, socio-economic factors, genetics and chronic diseases.


A simple form of a case-only design is the symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84). In this study, the risk of depression associated with cardiovascular drugs was estimated by analysing the non-symmetrical distribution of prescription orders for cardiovascular drugs and antidepressants.


The case-crossover design studies transient exposures with acute effects (The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events. Am J Epidemiol 1991;133:144-53) and The case-time-control design (Epidemiology 1995;6(3):248-53). It uses exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (Case-crossover and Case-Time-Control Designs as Alternatives in Pharmacoepidemiologic Research. Pharmacoepidemiol Drug Saf 1997; Suppl 3. S51-S59). However, if not well matched, the control group may reintroduce selection bias as discussed in Confounding and exposure trends in case-crossover and case-time-control designs(Epidemiology. 1996;7:231-9). In this situation, a ‘case-time-control’ method may be helpful as explained in Future cases as present controls to adjust for exposure trend bias in case-only studies (Epidemiology 2011;22:568–74).


The self-controlled case series (SCCS) design was primarily developed to investigate the association between a vaccine and an adverse event but is increasingly used to study drug exposure. In this design, the observation period following each exposure for each case is divided into risk period(s) (e.g. number(s) of days immediately following each exposure) and a control period (e.g. the remaining observation period). Incidence rates within the risk period after exposure are compared with incidence rates within the control period.


The Tutorial in biostatistics: the self-controlled case series method (Stat Med 2006; 25(10):1768-97) and the associated website explain how to fit SCCS models using standard statistical packages.


Like cohort or case-control studies, the SCCS method remains, however, susceptible to confounding by indication, at least if the indication varies over time. Relevant time intervals for the risk and control periods need also to be defined and this may become complex, e.g. with primary vaccination with several doses. The bias introduced by inaccurate specification of the risk window is discussed and a data-based approach for identifying the optimal risk windows is proposed in Identifying optimal risk windows for self-controlled case series studies of vaccine safety (Stat Med 2011; 30(7):742-52).


The SCCS also assumes that the event itself does not affect the chance of being exposed. The pseudolikelihood method developed to address this possible issue is described in Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16). Based on a review of 40 vaccine studies, Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect 2011;139(12):1805-17) assesses how the SCCS method has been used, highlights good practice and gives guidance on how the method should be used and reported. Using several methods of analysis is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs.


Within-person study designs had lower precision and greater susceptibility to bias because of trends in exposure than cohort and nested case-control designs (J Clin Epidemiol 2012;65(4):384-93) compares cohort, case-control, case-cross-over and SCCS designs to explore the association between thiazolidinediones and the risks of heart failure and fracture and anticonvulsants and the risk of fracture. The self-controlled case-series and case-cross over designs were more susceptible to bias, but this bias was removed when follow-up was sampled both before and after the outcome, or when a case-time-control design was used.


When should case-only designs be used for safety monitoring of medicinal products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strength and major difference (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should and should not be used for drug safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets. Four outcomes and five design choices were assessed.


In Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol. 2016 Oct 25., Epub ahead of print) it was demonstrated that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group.


5.3.3. Disease risk scores


An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol 1976;104:609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.


DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res 2009;18:67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows large variation exists with differences in terminology and methods used.


In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf. 2012 May;21 Suppl 2:138–47) it is argued that DRS may have its place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores (see below) may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology. 2016;27:133-42).


5.3.4. Propensity scores


Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.


A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol 1999;150:327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.


High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiol 2009; 20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol 2011;173:1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf 2012;20:849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity score to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS.


Most cohort studies match patients 1:1 on the propensity score. Increasing the matching ratio may increase precision but also bias. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf. 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.

The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf 2011;20:1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2014; Epub 2014 Jan 29). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol. 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested.


Performance of propensity score calibration – a simulation study (Am J Epidemiol 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.


Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol. 2010; 1;172(7):843–54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.


Although in most situations propensity score models, with the exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in drug safety research. Furthermore, assessment of the PS distribution may reveal non-positivity. An important limitation of PS is that it is not directly amenable for case-control studies.


5.3.5. Instrumental variables


Instrumental variable (IV) methods were invented over 70 years ago but were used by epidemiologists only recently. Over the past decade or so, non-parametric versions of IV methods have appeared that connect IV methods to causal and measurement-error models important in epidemiological applications. An introduction to instrumental variables for epidemiologists (Int J Epidemiol 2000;29:722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. Where IV assumptions are questionable, the corrections can still serve as part of the sensitivity analysis or external adjustment. Where the assumptions are more defensible, as in field trials and in studies that obtain validation or reliability data, IV methods can form an integral part of the analysis. A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumption could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).


A proposal for reporting instrumental variable analyses has been suggested in Commentary: how to report instrumental variable analyses (suggestions welcome) (Epidemiology. 2013;24(3):370-4). In particular the type of treatment effect (average treatment effect/homogeneity condition or local average treatment effect/monotonicity condition) and the testing of critical assumptions for valid IV analyses should be reported. In support of these guidelines, the standardized difference has been proposed to falsify the assumption that confounders are not related to the instrumental variable (Quantitative falsification of instrumental variables assumption using balance measures, Epidemiology. 2014;25(5):770-2).


The complexity of the issues associated with confounding by indication, channelling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis, the author showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which is at odds with results from clinical trials.. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2016;25 Suppl 1:132-41).


Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010;19:537–54) is a practical guidance on IV analyses in pharmacoepidemiology. Instrumental variable methods for causal inference (Stat Med. 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.


An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf 2014; 2014;23(2):165-77) demonstrated that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.


Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol. 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options.


5.3.6. Prior event rate ratios


Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) from periods before and after initiation of a drug exposure as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf. 2008;17:671-685).  For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.


Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2015;24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided.


5.3.7. Handling time-dependent confounding in the analysis


Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation. G-estimation

G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure. Marginal Structural Models (MSM)


The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11:550-60) introduces MSM, a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding.


MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11:561–70).


Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol 2003;158:687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta(2)-agonist use and the risk of coronary heart disease (Eur J Epidemiol 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression.

Beyond the approaches proposed above, traditional and efficient approaches to deal with time dependent variables should be considered in the design of the study, such as nested case control studies with assessment of time varying exposure windows.


5.4. Effect measure modifiation and interaction


Effect measure modification and interaction are often encountered in epidemiological research and it is important to recognize their occurrence. The difference between them is rather subtle and has been described in On the distinction between interaction and effect modification (Epidemiology. 2009;20:863–71). Effect measure modification occurs when the measure of an effect changes over values of some other variable (which does not necessarily need to be a causal factor). Interaction occurs when two exposures contribute to the causal effect of interest, and they are both causal factors. Interaction is generally studied in order to clarify aetiology while effect modification is used to identify populations that are particularly susceptible to the exposure of interest.


To check the presence of effect measure modifier, one can stratify the study population by a certain variable, e.g. by gender, and compare the effects in these subgroups. It is recommended to perform a formal statistical test to assess if there are statistically significant differences between subgroups for the effects, see CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials (J Clin Epidemiol 2010;63(8):e1-37) and Interaction revisited: the difference between two estimates (BMJ 2003;326:219). The study report should explain which method was used to examine these differences and specify which subgroup analyses were predefined in the study protocol and which ones were performed while analysing the data (Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Epidemiology 2007;18:805-35).


The presence of effect measure modification depends on which measure is used in the study (absolute or relative) and can be measured in two ways: on an additive scale (based on risk differences [RD]), or on a multiplicative scale (based on relative risks [RR]). From the perspective of public health and clinical decision making, the additive scale is usually considered the most appropriate. An example of potential effect modifier in studies assessing the risk of occurrence of events associated with recent drug use is the past use of the same drug. This is shown in Evidence of the depletion of susceptibles effect in non-experimental pharmacoepidemiologic research (J Clin Epidemiol 1994;47(7):731-7) in the context of a hospital-based case-control study on NSAIDs and the risk of upper gastrointestinal bleeding.


For the evaluation of interaction, the standard measure is the relative excess risk due to interaction (RERI), as explained in the textbook Modern Epidemiology (K. Rothman, S. Greenland, T. Lash. 3rd Edition, Lippincott Williams & Wilkins, 2008). Other measures of interaction include the attributable proportion (A) and the synergy index (S). With sufficient sample size, most interaction tests perform similarly with regard to type 1 error rates and power according to Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: results from a systematic review and simulation study (J Clin Epidemiol 2014; 67(7):821-9).


Due to surrounding confusion about these terms, is important that effect measure modification and interaction analysis are presented in a way that is easy to interpret and allows readers to reproduce the analysis. For recommendations regarding reporting, Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration (Epidemiology 2007;18:805-35) and Recommendations for presenting analyses of effect modification and interaction (Int J Epidemiol 2012;41:514-20) are useful resources and their recommendations for the presentation of results are summarized below:


  1. Separate effects (rate ratios, odds ratios or risk differences, with confidence intervals) of the exposure of interest (e.g. drug), of the effect modifier (e.g. gender) and of their joint effect using one single reference category (preferably the stratum with the lowest risk of the outcome) as suggested in Estimating measures of interaction on an additive scale for preventive exposures (Eur J Epidemiol 2011;26(6):433-8) as this gives enough information to the reader to calculate effect modification on an additive and multiplicative scale;


  1. Effects of the exposure within strata of the potential effect modifier;


  1. Measures of effect modification on both additive (e.g. RERI) and multiplicative (e.g. S) scales including confidence intervals;


  1. List of the confounders for which the association between exposure and outcome was adjusted for.


5.5. Ecological analyses and case-population studies


Ecological analyses should not be considered hypothesis testing studies. As illustrated in Control without separate controls: evaluation of vaccine safety using case-only methods (Vaccine 2004; 22(15-16):2064-70), they assume that a strong correlation between the trend in an indicator of an exposure (vaccine coverage in this example) and the trend in incidence of a disease (trends calculated over time or across geographical regions) is consistent with a causal relationship. Such comparisons at the population level may only generate hypotheses as they do not allow controlling for time-related confounding variables, such as age and seasonal factors. Moreover, they do not establish that the vaccine effect occurred in the vaccinated individuals.


Case-population studies are a form of ecological studies where cases are compared to an aggregated comparator consisting of population data. The case-population study design: an analysis of its application in pharmacovigilance (Drug Saf 2011;34(10):861-8) explains its design and its application in pharmacovigilance for signal generation and drug surveillance. More recently, the design was explained in the textbook ‘Drug Utilization Research - Methods and Applications’ (Wettermark B, Di Martino M, Elsevier M. Study designs in drug utilization research. An example is a multinational case-population study aiming to estimate population rates of a suspected adverse event using national sales data (Transplantation for Acute Liver Failure in Patients Exposed to NSAIDs or Paracetamol (Acetaminophen) Drug Saf 2013;36:135–44). Based on the same study, Choice of the denominator in case population studies: event rates for registration for liver transplantation after exposure to NSAIDs in the SALT study in France (Pharmacoepidemiol Drug Saf 2013;22(2):160-7) compares sales data and healthcare insurance data as denominators to estimate population exposure and found large differences in the event rates. Choosing the wrong denominator in case population studies might, therefore, give erroneous results. The choice of the right denominator will depend on the hazard function of the adverse event.


A pragmatic attitude towards case-population studies is recommended: in situations where nation-wide or region-wide EHR)are available that allow assessing the outcomes and confounders with sufficient validity, a case-population approach is neither necessary nor wanted, as one can perform a population-based cohort or case-control study with adequate control for confounding. In situations where outcomes are difficult to ascertain in EHRs or where such databases do not exist, the case-population design might give an approximation of the absolute and relative risk when both events and exposures are rare. This is limited by the ecological nature of the reference data that restricts the ability to control for confounding to some basic variables, such as sex and age, and precludes an exhaustive control for confounding.


5.6. Pragmatic trials and large simple trials


5.6.1. Pragmatic trials


RCTs are considered the gold standard for demonstrating the efficacy of medicinal products and for obtaining an initial estimate of the risk of adverse outcomes. However, as is well understood, these data are often not necessarily indicative of the benefits, risks or comparative effectiveness of an intervention when used in clinical practice populations. The IMI GetReal Glossary defines a pragmatic clinical trial (PCT) as ‘a study comparing several health interventions among a randomised, diverse population representing clinical practice, and measuring a broad range of health outcomes.’ PCTs are focused on evaluating benefits and risks of treatments in patient populations and settings more representative of routine clinical practice. To ensure generalisability, pragmatic trials should represent the patients to whom the treatment will be applied, for instance, inclusion criteria would be broad (e.g. allowing co-morbidity, co-medication, wider age range), the follow-up would be minimized and allow for treatment switching etc. Monitoring safety in a phase III real-world effectiveness trial: use of novel methodology in the Salford Lung Study (Pharmacoepidemiol Drug Saf. 2017;26(3):344-352) describes the model of a phase III pragmatic RCT where patients were enrolled through primary care practices using minimal exclusion criteria and without extensive diagnostic testing and where potential safety events were captured through patients’ electronic health records and in turn triggered review by the specialist safety team.


Pragmatic explanatory continuum summary (PRECIS): a tool to help trial designers (CMAJ 2009; 180: E45-57) is a tool to support pragmatic trial designs and helps define and evaluate the degree of pragmatism. The PRECIS tool has been further refined and now comprises nine domains each scored on a 5 point Likert scale ranging from very explanatory to very pragmatic with an exclusive focus on the issue of applicability (The PRECIS-2 tool: designing trials that are fit for purpose. BMJ 2015; 350: h2147).  A checklist and additional guidance is also provided in Improving the reporting of pragmatic trials: an extension of the CONSORT statement (BMJ 2008; 337 (a2390): 1-8).


5.6.2. Large simple trials


Large simple trials (LSTs) are pragmatic RCTs with minimal data collection protocols that are narrowly focused on clearly defined outcomes important to patients as well as clinicians. Their large sample size provides adequate statistical power to detect even small differences in effects. Additionally, LST's include a follow-up time that mimics normal clinical practice.


LSTs are particularly suited when an adverse event is very rare or with a delayed latency (with a large expected attrition rate), when the population exposed to the risk is heterogeneous (e.g. different indications and age groups), when several risks need to be assessed in the same trial or when many confounding factors need to be balanced between treatment groups. In these circumstances, the cost and complexity of a traditional RCT may outweigh its advantages and LSTs can help keep the volume and complexity of data collection to a minimum.


Outcomes that are simple and objective can also be measured from the routine process of care using epidemiological follow-up methods, for example by using questionnaires or hospital discharge records. LST methodology is discussed in Chapters 36 and 37 of the book Pharmacoepidemiology (Strom BL, Kimmel SE, Hennessy S. 5th Edition, Wiley, 2012), which includes a list of conditions appropriate for the conduct of a LST and a list of conditions which make a LST feasible. Examples of published LSTs are Assessment of the safety of paediatric ibuprofen: a practitioner based randomised clinical trial (JAMA 1995;279:929-33) and Comparative mortality associated with ziprasidone and olanzapine in real-world use among 18,154 patients with schizophrenia: The Zodiac Observational Study of Cardiac Outcomes (ZODIAC) (Am J Psychiatry 2011;168(2):193-201).


Note that the use of the term ‘simple’ in the expression ‘LST’ refers to data structure and not data collection. It is used in relation to situations in which a small number of outcomes are measured. The term may therefore not adequately reflect the complexity of the studies undertaken.


5.6.3. Randomised database studies


Randomised database studies can be considered a special form of an LST where patients included in the trial are enrolled in a healthcare system with electronic records. Eligible patients may be identified and flagged automatically by the software, with the advantage of allowing comparison of included and non-included patients. Database screening or record linkage can be used to detect and measure outcomes of interest otherwise assessed through the normal process of care. Patient recruitment, informed consent and proper documentation of patient information are hurdles that still need to be addressed in accordance with the applicable legislation for RCTs. Randomised database studies attempt to combine the advantages of randomisation and observational database studies. These and other aspects of randomised database studies are discussed in The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials (Health Technol Assess. 2014;18(43):1-146) which illustrates the practical implementation of randomised studies in general practice databases.


There are few published examples of randomised database studies, but this design could become more common in the near future with the increasing computerisation of medical records. Pragmatic randomised trials using routine electronic health records: putting them to the test (BMJ 2012;344:e55) describes a project to implement randomised trials in the everyday clinical work of general practitioners, comparing treatments that are already in common use, and using routinely collected electronic healthcare records both to identify participants and to gather results.


A particular form of randomised databases studies is the registry-based randomised trial, which uses an existing registry as platform for the identification of cases, randomisation and follow-up. The editorial Randomized Registry Trial — The Next Disruptive Technology in Clinical Research? (N N Engl J Med 2013; 369:1579-1581 ) introduces the concept. This hybrid design tries to achieve both internal and external validity by using a robust design (RCT) in a data source with higher generalisability (registries). Other examples are the TASTE trial that followed patients long-term using data from a Scandinavian registry (Thrombus aspiration during ST-segment elevation myocardial infarction. N. Engl J Med. 2013;369(17):1587-97) and A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial (JACC Cardiovasc Interv. 2014 Aug). A potential limitation of randomised registry trials is the routine collection of data on outcomes needed for the trial, such as information on surrogate markers and adverse events.


5.7. Systematic reviews and meta-analysis


There may be results from more than one study with the same or similar research objective, and identification and integration of this evidence can extend our understanding of the issue. The focus of this activity may be to learn from the diversity of designs, results and associated gaps in knowledge as well as to obtain overall risk estimates. An example is the meta-analysis of results of individual studies with potentially different design e.g. Variability in risk of gastrointestinal complications with individual NSAIDs: results of a collaborative meta-analysis (BMJ 1996;312:1563-6), which compared the relative risks of serious gastrointestinal complications reported with individual NSAIDs by conducting a systematic review of twelve hospital and community based case-control and cohort studies, and found a relation between use of the drugs and admission to hospital for haemorrhage or perforation.


A systematic literature review aims to collect all empirical evidence that fits pre-specified eligibility criteria to answer a specific research question. These reviews use systematic and explicit methods to identify and critically appraise relevant research, and to analyse the data included in the review. A meta-analysis involves the use of statistical techniques to integrate and summarize the results of identified studies.


Systematic review and meta-analysis of observational studies and other epidemiological sources are becoming as common as those of RCTs.  Challenges in systematic reviews that assess treatment harms (Ann Intern Med 2005;142:1090-9) explains the different reasons why both are important in providing relevant information and knowledge for pharmacovigilance.


A detailed guidance on the methodological conduct of systematic reviews and meta-analysis is reported in Annex 1 of this guide. This guidance includes links to other relevant resources.


It should be noted that meta-analysis, even of randomised controlled trials, shares characteristics with observational research: the studies are often produced according to an unplanned process and subjective processes are involved in selection of studies to include. Careful planning in design of a meta-analysis and pre-specification of selection criteria, outcomes and analytical methods before review of any study results may thus add appreciably to the confidence that is placed in the results. A further useful reference is the CIOMS Working Group X Guideline on Evidence Synthesis and Meta-Analysis for Drug Safety (Geneva 2016).


5.8. Signal detection methodology and application


A general overview of methods for signal detection and recommendations for their application are provided in the report of the CIOMS Working Group VIII Practical Aspects of Signal Detection in Pharmacovigilance and empirical results on various aspects of signal detection obtained from the IMI PROTECT project have been summarised in Good Signal Detection Practices: Evidence from IMI PROTECT (Drug Saf. 2016;39:469-90).


Quantitative analysis of spontaneous adverse drug reaction reports is increasingly used in drug safety research. The role of data mining in pharmacovigilance (Expert Opin Drug Saf 2005;4(5):929-48) explains how signal detection algorithms work and addresses questions regarding their validation, comparative performance, limitations and potential for use and misuse in pharmacovigilance. Quantitative signal detection using spontaneous ADR reporting (Pharmacoepidemiol Drug Saf 2009;18:427-36) describes the core concepts behind the most common methods, the proportional reporting ratio (PRR), reporting odds ratio (ROR), information component (IC) and empirical Bayes geometric mean (EBGM). The authors also discuss the role of Bayesian shrinkage in screening spontaneous reports and the importance of changes over time in screening the properties of the measures. Additionally, they discuss major areas of controversy (such as stratification and evaluation and implementation of methods) and give some suggestions as to where emerging research is likely to lead. Data mining for signals in spontaneous reporting databases: proceed with caution (Pharmacoepidemiol Drug Saf 2007;16:359–65) reviews data mining methodologies and their limitations and provides useful points to consider before incorporating data mining as a routine component of any pharmacovigilance program. An empirical evaluation of several disproportionality methods in a number of different spontaneous reporting databases is given in Comparison of Statistical Detection Methods within and across Spontaneous Reporting Databases (Drug Saf 2015; 38(6); 577-87).


Methods such as multiple logistic regression (that may use propensity score-adjustment) have the theoretical capability to reduce masking and confounding by co-medication and underlying disease.


Performance of Pharmacovigilance Signal Detection Algorithms for the FDA Adverse Event Reporting System (Clin Pharmacol Ther 2013;93(6):539-46) describes the performance of signal-detection algorithms for spontaneous reports in the US FDA adverse event reporting system against a benchmark constructed by the Observational Medical Outcomes Partnership OMOP. It concludes that logistic regression performs better than traditional disproportionality analysis. Other studies have addressed similar or related matters: for example, Large scale regression-based pattern discovery: the example of screening the WHO global drug safety database (Stat. Anal. Data Min 2010; 3, 197–208), Are all quantitative postmarketing signal detection methods equal? Performance characteristics of logistic regression and Multi-item Gamma Poisson Shrinker (Pharmacoepidemiol. Drug Saf. 2012; 21, 622–630 and Data-driven prediction of drug effects and interactions (Sci. Transl. Med. 2012 4, 125ra31). The letter Logistic regression in signal detection: Another Piece added to the Puzzle (Clin Pharmacol Ther 2013;94 (3):312) highlights the variability of results obtained in different studies based on this method and the daunting computational task it requires. More work is needed on its value for pharmacovigilance in the real world setting.


A more recent proposal involves a broadening of the basis for computational screening of individual case safety reports, by considering multiple aspects of the strength of evidence in a predictive model. This approach combines disproportionality analysis with features such as the number of well-documented reports, the number of recent reports and geographical spread of the case series (Improved statistical signal detection in pharmacovigilance by combining multiple strength-of-evidence aspects in vigiRank. Drug Saf 2014;378):617–28). In a similar spirit, logistic regression has been proposed to combine a disproportionality measure with a measure of unexpectedness for the time-to-onset distribution (Use of logistic regression to combine two causality criteria for signal detection in vaccine spontaneous report data, Drug Saf 2014;37(12):1047-57).


Many statistical signal detection algorithms disregard the underlying diversity and give equal weight to reports on all patients when computing the expected number of reports for a drug-event pair. This may render them vulnerable to confounding and distortions due to effect modification, and could result in true signals being masked or false associations being flagged as potential signals. Stratification and/or subgroup analyses might address these issues, and whereas stratification is implemented in some standard software packages, routine use of subgroup analyses is less common. Performance of Stratified and Subgrouped Disproportionality Analyses in Spontaneous Databases (Drug Saf 2016; 39: (4):355-364) performed a comparison across a range of spontaneous report databases and covariates and found  subgroup analyses to improve first pass signal detection, whereas stratification did not; subgroup analyses by patient age and country of origin were found to bring greatest value.


Masking is a statistical issue by which true signals of disproportionate reporting are hidden by the presence of other products in the database. While it is not currently perfectly understood, publications have described methods assessing the extent and impact of the masking effect of measures of disproportionality. They include A conceptual approach to the masking effect of measures of disproportionality (Pharmacoepidemiol Drug Saf 2014;23(2):208-17), with an application described in Assessing the extent and impact of the masking effect of disproportionality analyses on two spontaneous reporting systems databases (Pharmacoepidemiol Drug Saf 2014;23(2):195-207), Outlier removal to uncover patterns in adverse drug reaction surveillance - a simple unmasking strategy (Pharmacoepidemiol Drug Saf 2013;22(10):1119-29) and A potential event-competition bias in safety signal detection: results from a spontaneous reporting research database in France (Drug Saf 2013;36(7):565-72). The value of these methods in practice needs to be further investigated.


The Guideline on the use of statistical signal detection methods in the Eudravigilance data analysis system describes quantitative methods of disproportionality implemented in signal detection by the European Medicines Agency (EMA) together with the elements for their interpretation and their potential limitations in the frame of pharmacovigilance. It encompasses the use of quantitative methods in EudraVigilance applied to the evaluation of Individual Case Safety Reports (ICSRs) originating from healthcare professionals and involving authorised medicinal products.


A time-consuming step in signal detection of adverse reactions is the determination of whether an effect is already recorded in the product information. A database which can be searched for this information allows filtering or flagging reaction monitoring reports for signals related to unlisted reactions, thus improving considerably the efficiency of the signal detection process by allowing a comparison only to drugs for which the adverse event was not considered to be causally related. In research, it permits an evaluation of the effect of background restriction on the performance of statistical signal detection. An example of such database is the PROTECT Database of adverse drug reactions (EU SPC ADR database), a structured Excel database of all adverse drug reactions (ADRs) listed in section 4.8 of the Summary of Product Characteristics (SPC) of medicinal products authorised in the European Union (EU) according to the centralised procedure, based exclusively on the Medical Dictionary for Regulatory Activities (MedDRA) terminology.


Other large observational databases such as claims and electronic medical records databases are potentially useful as part of a larger signal detection and refinement strategy. Modern methods of pharmacovigilance: detecting adverse effects of drugs (Clin Med 2009;9(5):486-9) describes the strengths and weaknesses of different data sources for signal detection (spontaneous reports, electronic patient records and cohort-event monitoring). A number of studies have considered the use of observational data in electronic systems that complement existing methods of safety surveillance e.g. the PROTECT, OHDSI and Sentinel projects.


The EU Guideline on good pharmacovigilance practices (GVP) Module IX - Signal Management defines signal management as the set of activities performed to determine whether, based on an examination of individual case safety reports (ICSRs), aggregated data from active surveillance systems or studies, literature information or other data sources, there are new risks associated with an active substance or a medicinal product or whether risks have changed. Signal management covers all steps from detecting signals (signal detection), through their validation and confirmation, analysis, prioritisation and assessment to recommending action, as well as the tracking of the steps taken and of any recommendations made.

The FDA’s Guidance for Industry-Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment provides best practice for documenting, assessing and reporting individual case safety reports and case series and for identifying, evaluating, investigating and interpreting safety signals, including recommendations on data mining techniques and use of pharmacoepidemiological studies.


Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes