5.1.1. Assessment of exposure
5.1.2. Assessment of outcomes
5.1.3. Assessment of covariates
5.2.1. Selection bias
5.2.2. Information bias
5.6.1. Pragmatic trials
5.6.2. Large simple trials
5.6.3. Randomised database studies
Historically, pharmacoepidemiology studies relied on patient-supplied information or searches through paper-based health records. The rapid increase in access to electronic healthcare records and large administrative databases has changed the way exposures and outcomes are defined, measured and validated. Chapter 41 of Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) includes a literature review of the studies that have evaluated the validity of drug, diagnosis and hospitalisation data and the factors that influence the accuracy of these data. This book also presents information on data sources available for pharmacoepidemiology studies including questionnaires and administrative databases. Further information on databases available for pharmacoepidemiology studies is available in resources such as the ENCePP resource database and the Inventory of Drug Consumption Databases in Europe. Studies to evaluate outcome and exposure identification are however resource intensive. An example of this is Mini‐Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative data (Pharmacoepidemiol Drug Saf 2012; 21(S1): 82-9).
In pharmacoepidemiology studies, exposure data originate mainly from four data sources: data on prescribing (e.g. CPRD primary care data), data on dispensing (e.g. PHARMO outpatient pharmacy database), data on payment for medication (namely claims data, e.g. IMS LifeLink PharMetrics Plus) and data collected in surveys. The population included in these data sources follows a process of attrition: drugs that are prescribed are not necessarily dispensed, and drugs that are dispensed are not necessarily ingested. In Primary non-adherence in general practice: a Danish register study (Eur J Clin Pharmacol 2014;70(6):757-63), 9.3% of all prescriptions for new therapies were never redeemed at the pharmacy, with different percentages per therapeutic and patient groups. The attrition from dispensing to ingestion is even more difficult to measure, as it is compounded by uncertainties about which dispensed drugs are actually taken by the patients and the patients’ ability to provide an accurate account of their intake. In addition, paediatric adherence is dependent on parents’ accurate recollection and recording.
Exposure definitions can include simple dichotomous variables (e.g. ever exposed vs. never exposed) or be more detailed, including estimates of duration, exposure windows (e.g. current vs. past exposure) or dosage (e.g. current dosage, cumulative dosage over time). Consideration should be given to the level of detail available from the data sources on the timing of exposure, including the quantity prescribed, dispensed or ingested and the capture of dosage instructions. This will vary across data sources and exposures (e.g. estimating anticonvulsant ingestion is typically easier than estimating rescue medication for asthma attacks). Discussions with clinicians regarding sensible assumptions will be informative for the variable definition.
The Methodology chapter of the book Drug Utilization Research. Methods and Applications (M. Elseviers, B. Wettermark, A.B. Almarsdottir et al. Ed. Wiley Blackwell, 2016) discusses different methods for data collection on drug utilisation.
A case definition compatible with the data source should be developed for each outcome of a study at the design stage. This description should include how events will be identified and classified as cases, whether cases will include prevalent as well as incident cases, exacerbations and second episodes (as differentiated from repeat codes) and all other inclusion or exclusion criteria. The reason for the data collection and the nature of the healthcare system that generated the data should also be described as they can impact on the quality of the available information and the presence of potential biases. Published case definitions of outcomes, such as those developed by the Brighton Collaboration in the context of vaccinations, are useful but are not necessarily compatible with the information available in the available observational data source. For example, information on the duration of symptoms may not be available, or additional codes may have been added to the data set following publication of the outcome definition.
Search criteria to identify outcomes should be defined and the list of codes and any used algorithm should be provided. Generation of code lists requires expertise in both the coding system and the disease area. Researchers should consult clinicians who are familiar with the coding practice within the studied field. Suggested methodologies are available for some coding systems (see Creating medical and drug code lists to identify cases in primary care databases. Pharmacoepidemiol Drug Saf 2009;18(8):704-7). Coding systems used in some commonly used databases are updated regularly so sustainability issues in prospective studies should be addressed at the protocol stage. Moreover, great care should be given when re-using a code list from another study as code lists depend on the study objective and methods. Public repository of codes as Clinicalcodes.org is available and researchers are also encouraged to make their own set of coding available.
In some circumstances, chart review or free text entries in electronic format linked to coded entries can be useful for outcome identification. Such identification may involve an algorithm with use of multiple code lists (for example disease plus therapy codes) or an endpoint committee to adjudicate available information against a case definition. In some cases, initial plausibility checks or subsequent medical chart review will be necessary. When databases contain prescription data only, drug exposure may be used as a proxy for an outcome, or linkage to different databases is required.
In pharmacoepidemiology studies, covariates are often used for selecting and matching study subjects, comparing characteristics of the cohorts, developing propensity scores, creating stratification variables, evaluating effect modifiers and adjusting for confounders. Reliable assessment of covariates is therefore essential for the validity of results. Patient characteristics and other key covariates that could be confounding variables need to be evaluated using all available data. A given database may or may not be suitable for studying a research question depending on the availability of information on these covariates.
Some patient characteristics and covariates vary with time and accurate assessment is therefore time dependent. The timing of assessment of the covariates is an important factor for the correct classification of the subjects and should be clearly specified in the protocol. Capturing covariates can be done at one or multiple points during the study period. In the later scenario, the variable will be modeled as time-dependent variable.
Assessment of covariates can be done using different periods of time (look-back periods or run-in periods). Fixed look-back periods (for example 6 months or 1 year) are sometimes used when there are changes in coding methods or in practices or when to using the entire medical history of a patient is not feasible. Estimation using all available covariates information versus a fixed look-back window for dichotomous covariates (Pharmacoepidemiol Drug Saf 2013; 22(5):542-50) establishes that defining covariates based on all available historical data, rather than on data observed over a commonly shared fixed historical window will result in estimates with less bias. However, this approach may not be applicable when data from paediatric and adult periods are combined because covariates may significantly differ between paediatric and adult populations (e.g. height and weight).
In healthcare databases, the correct assessment of drug exposure, outcome and covariate is crucial to avoid misclassification. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract 2010;60:e128-36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf 2012; Suppl 1:82-9) provide examples.
Potential misclassification of exposure, outcome and other variables should be measured and removed or reduced. External validation against chart review or physician/patient questionnaire is possible in some instances but the questionnaires cannot always be considered as ‘gold standard’. While the positive predicted value is more easily measured than the negative predictive value, the speciﬁcity of an outcome is more important than its sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics, J Clin Epidemiol 2005;58(4):323-37). When validation of the variable is complete, the study point estimate should be adjusted accordingly (see Use of the Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies, Am J Epidemiol 1993;138 (11):1007–15 and Sentinel Quantitative Bias Analysis Methodology Development: Sequential Bias Adjustment for Outcome Misclassification, 2017).
Differential misclassification should be measured by validating each comparison group.
For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of previous validation study should however consider the effect of any differences in prevalence and inclusion and exclusion criteria, the distribution and analysis of risk factors as well as subsequent changes to health care, procedures and coding, as illustrated in Basic Methods for Sensitivity Analysis of Biases, (Int J Epidemiol 1996; 25(6): 1107-16). The accurate date of onset is particularly important for studies relying upon timing of exposure and outcome such as in the self-controlled case series. A comparison of data from registries with clinical or administrative records can also validate individual records on a specific outcome.
Linkage validation can be used when another database is used for the validation through linkage methods (see Using linked electronic data to validate algorithms for health outcomes in administrative databases, J Comp Eff Res 2015; 4:359-66). In some situations there is no access to a resource to provide data for comparison. In this case, indirect validation may be an option, as explained in the book Applying quantitative bias analysis to epidemiologic data (Lash T, Fox MP, Fink AK Springer-Verlag, New-York, 2009).
Structural validation of the database with internal logic checks can also be performed to verify the completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures or if a certain variable has values within a known reasonable range.
Selection bias means the selective recruitment into the study of subjects that are not representative of the exposure or outcome pattern in the source population. Examples of common selection bias are referral bias and self-selection bias (Strom BL, Kimmel SE, Hennessy S. Pharmacoepidemiology, 5th Edition, Wiley, 2012). Other forms of selection biases are presented below.
Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias thus reflects a reversal of cause and effect (see Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65:2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies. It may be handled by including a time-lag, (i.e. by disregarding all exposure during a specified period of time before the index date).
The practice of including prevalent users in observational studies, i.e. patients already taking a therapy for some time before study follow-up began, can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if the risk varies with time, as seen in the association between contraceptive intake and venous thrombosis which was initially overestimated due to the heathy-users bias. (see The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update 1999;5(6)). Secondly, covariates for drug use at study entry are often influenced by the intake of the drug.
Information bias arises when incorrect information about either exposure or outcome or any covariates is collected in the study. It can be either non-differential when it does occur randomly across exposed/non-exposed participants or differential when it is influenced by the disease or exposure status.
Non-differential misclassification bias drives the risk estimate towards the null value, while differential bias can drive the risk estimate in either direction. Examples of non-differential misclassification bias are recall bias (e.g., in case controls studies cases and controls can have different recall of their past exposures) and surveillance or detection bias.
Surveillance bias (or detection bias)
Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or an associated symptom. For example, post-menopausal exposure to estrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between estrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).
This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. These issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA 2011;305(23):2462-3)).
Time-related bias is most often a form of differential misclassification bias and is triggered by inappropriate accounting of follow-up time and exposure status in the study design and analysis.
The choice of the exposure risk window can influence risk comparisons due to misclassification of drug exposure possibly associated with risks that vary over time. A study of the effects of exposure misclassification due to the time-window design in pharmacoepidemiologic studies (Clin Epidemiol 1994:47(2):183–89) considers the impact of the time-window design on the validity of risk estimates in record linkage studies. In adverse drug reaction studies, an exposure risk-window constitutes the number of exposure days assigned to each prescription. The ideal design situation would occur when each exposure risk-window would only cover the period of potential excess risk. The estimation of the time of drug-related risk is however complex as it depends on the duration of drug use, timing of ingestion and the onset and persistence of drug toxicity. With longer windows, a substantive attenuation of incidence rates may be observed. Risk windows should be validated or sensitivity analyses should be conducted.
Immortal time bias
Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur. (K. Rothman, S. Greenland, T. Lash. Modern Epidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008 p. 106-7).
Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf 2007;16:241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this bias.
Immortal Time Bias in Pharmacoepidemiology (Am J Epidemiol 2008;167:492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions and illustrates it with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time.
Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162:1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects are excluded from the analysis and the study allows effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ 2010; 340:b5087) describes how immortal time in observational studies can bias the results in favor of the treatment group and how they can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias on pharmacoepidemiology’ (Am J Epidemiol 2009; 170: 667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.
Other forms of time-related bias
Time-window Bias in Case-control Studies: Statins and Lung Cancer (Epidemiology 2011; 22 (2):228-31) describes a case-control study which reported a 45% reduction in the rate of lung cancer with any statin use. A differential misclassification bias arose from the methods used to select controls and measure their exposure, which resulted in exposure assessment to statins being based on a shorter time-span for cases than controls and an over-representation of unexposed cases. Properly accounting for time produced a null association.
In many database studies, exposure status during hospitalisations is unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure, especially when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol 2008;168 (3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.
In the case of case control studies assessing chronic diseases with multiple hospitalisations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias, as shown in Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol 2008;168 (3):329-35).
In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g. diabetes), which can induce confounding of the association with an outcome (e.g. cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).
Confounding occurs when the estimate of measure of association is distorted by the presence of another risk factor. For a variable to be a confounder, it must be associated with both the exposure and the outcome, without being in the causal pathway.
Confounding by indication
Confounding by indication refers to a determinant of the outcome parameter that is present in people at perceived high risk or poor prognosis and is an indication for intervention. This means that differences in care between the exposed and non-exposed, for example, may partly originate from differences in indication for medical intervention such as the presence of risk factors for particular health problems. Other names for this type of confounding are ‘channelling’ or ‘confounding by severity’.
This type of confounding has frequently been reported in studies evaluating the efficacy of pharmaceutical interventions and is almost always encountered in various extents in pharmacoepidemiological studies. A good example can be found in Confounding and indication for treatment in evaluation of drug treatment for hypertension (BMJ 1997;315:1151-4).
The article Confounding by indication: the case of the calcium channel blockers (Pharmacoepidemiol Drug Saf 2000;9(1):37-41) demonstrates that studies with potential confounding by indication can benefit from appropriate analytic methods, including separating the effects of a drug taken at different times, sensitivity analysis for unmeasured confounders, instrumental variables and G-estimation (see Chapter 5.3).
With the more recent application of pharmacoepidemiological methods to assess effectiveness, confounding by indication is a greater challenge and the article Approaches to combat with confounding by indication in observational studies of intended drug effects (Pharmacoepidemiol Drug Saf 2003;12(7):551-8) focusses on its possible reduction in studies of intended effects. An extensive review of these and other methodological approaches discussing their strengths and limitations is discussed in Methods to assess intended effects of drug treatment in observational studies are reviewed (J Clin Epidemiol 2004;57(12):1223-31).
Complete adjustment for confounders would require detailed information on clinical parameters, lifestyle or over-the-counter medications, which are often not measured in electronic healthcare records, causing residual confounding bias. Using directed acyclic graphs to detect limitations of traditional regression in longitudinal studies (Int J Public Health 2010;55(6):701-3) reviews confounding and intermediate effects in longitudinal data and introduces causal graphs to understand the relationships between the variables in an epidemiological study.
Unmeasured confounding can be adjusted for only through randomisation. When this is not possible, as most often in pharmacoepidemiological studies, the potential impact of residual confounding on the results should be estimated and considered in the discussion.
Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics (Pharmacoepidemiol Drug Saf 2006;15(5):291-303) provides a systematic approach to sensitivity analyses to investigate the impact of residual confounding in pharmacoepidemiological studies that use healthcare utilisation databases. In this article, four basic approaches to sensitivity analysis were identified: (1) sensitivity analyses based on an array of informed assumptions; (2) analyses to identify the strength of residual confounding that would be necessary to explain an observed drug-outcome association; (3) external adjustment of a drug-outcome association given additional information on single binary confounders from survey data using algebraic solutions; (4) external adjustment considering the joint distribution of multiple confounders of any distribution from external sources of information using propensity score calibration. The paper concludes that sensitivity analyses and external adjustments can improve our understanding of the effects of drugs in epidemiological database studies. With the availability of easy-to-apply spreadsheets (e.g. at http://www.drugepi.org/dope-downloads/), sensitivity analyses should be used more frequently, substituting qualitative discussions of residual confounding.
The amount of bias in exposure-effect estimates that can plausibly occur due to residual or unmeasured confounding has been debated. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study (Am J Epidemiol 2007;166(6):646–55) considers the extent and patterns of bias in estimates of exposure-outcome associations that can result from residual or unmeasured confounding, when there is no true association between the exposure and the outcome. With plausible assumptions about residual and unmeasured confounding, effect sizes of the magnitude frequently reported in observational epidemiological studies can be generated. This study also highlights the need to perform sensitivity analyses to assess whether unmeasured and residual confounding are likely problems. Another important finding of this study was that when confounding factors (measured or unmeasured) are interrelated (e.g. in situations of confounding by indication), adjustment for a few factors can almost completely eliminate confounding.
New user (incident user) designs restrict the study population to persons who are observed at the start of treatment. New user design prevents ‘depletion of susceptibles’ – unwanted exclusion from a safety assessment of persons discontinuing treatments following early adverse reactions- and helps alleviate healthy user bias for preventive treatments in some circumstances. The article Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol 2003;158 (9):915–20) defines new-user designs and explains how they can be implemented as case-control studies. New user design helps mitigate confounding by indication, severity or frailty, as described in The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep 2015;2(4):221-8).
The use of case only study designs can reduce selection bias where the statistical assumptions of the method are fulfilled (see Chapter 22.214.171.124).
The active comparator approach includes only populations who have received treatment (see Chapter 126.96.36.199). These comparisons are less likely to be biased by unmeasured patient characteristics than studies where one group received no therapy at all (see Healthy User and Related Biases in Observational Studies of Preventive Interventions: A Primer for Physicians. J Gen Intern Med 2011, 26(5):546-50).
Misclassification can occur in exposure, outcome, or covariate variables. Outcome misclassification occurs when a non-case is classified as a case (false positive error) or a case is classified as a non-case (false negative error). Errors are quantified as estimates of positive predictive value (PPV), negative predictive value, sensitivity and specificity. Most database studies will be subject to outcome misclassification to some degree, unless cases have been adjudicated against a case definition, so the point estimate should always be adjusted accordingly. One should avoid the epidemiologic ‘mantra’ about non-differential misclassification of exposure producing conservative estimates, because the logic does not necessarily apply. Good practices for quantitative bias analysis (Int J Epidemiol 2014;43(6):1969-85) advocates explicit and quantitative assessment of misclassification bias, including decision guidance on which biases to assess in a given situation, what level of sophistication to use, and how to present the results. Use of the Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies (Am J Epidemiol 1993;138(11):1007-15) proposes a method based on estimates of the PPV which requires validation of a sample of those with the outcome only. By addressing misclassification of confounding variables, for example, by external adjustment, one alleviates the issue of residual confounding (see Adjustments for unmeasured confounders in pharmacoepidemiologic database studies using external information. Med Care 2007;45(10 Supl 2):S158-65).
Case-only designs reduce confounding by using the exposure history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, such as sex, socio-economic factors, genetics and chronic diseases. A review of case only designs is available in Use of self-controlled designs in pharmacoepidemiology (J Intern Med 2014; 275(6): 581-9).
A simple form of a case-only design is the symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84).
The case-crossover design compares the risk of exposure in a time period prior to an outcome with that in an earlier reference time-period, or set of time periods, to examine the effect of transient exposures on acute events (see The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events, Am J Epidemiol 1991;133(2):144-53). The case-time-control designs are a modification of case-crossover designs which use exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (The case-time-control design, Epidemiology 1995;6(3):248-53). However, if not well matched, the case-time-control group may reintroduce selection bias (Confounding and exposure trends in case-crossover and case-time-control designs (Epidemiology 1996;7(3):231-9). Methods have been suggested to overcome the exposure-trend bias while controlling for time-invariant confounders (see Future cases as present controls to adjust for exposure trend bias in case-only studies, Epidemiology 2011;22(4):568-74, and "First-wave" bias when conducting active safety monitoring of newly marketed medications with outcome-indexed self-controlled designs, Am J Epidemiol 2014;180(6):636-44). Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol 2016; 184(10):761-9) demonstrates that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group.
In the self-controlled case series (SCCS) design, the observation period following each exposure for each case is divided into risk period(s) (e.g. number of days immediately following each exposure) and a control period (observed time outside this risk period). Incidence rates within the risk period after exposure are compared with incidence rates within the control period. The Tutorial in biostatistics: the self-controlled case series method (Stat Med 2006; 25(10):1768-97) and the associated website http://statistics.open.ac.uk/sccs explain how to fit SCCS models using standard statistical packages. The bias introduced by inaccurate specification of the risk window is discussed and a data-based approach for identifying the optimal risk windows is proposed in Identifying optimal risk windows for self-controlled case series studies of vaccine safety (Stat Med 2011; 30(7):742-52). The SCCS also assumes that the event itself does not affect the chance of being exposed. The pseudo-likelihood method developed to address this possible issue is described in Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16). Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect 2011;139(12):1805-17) assesses how the SCCS method has been used across 40 vaccine studies, highlights good practice and gives guidance on how the method should be used and reported. Using several methods of analysis is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs.
When should case-only designs be used for safety monitoring of medical products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strength and major difference (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should and should not be used for Drug Safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets. Four outcomes and five design choices were assessed. Within-person study designs had lower precision and greater susceptibility to bias because of trends in exposure than cohort and nested case-control designs (J Clin Epidemiol 2012;65(4):384-93) compares cohort, case-control, case-cross-over and SCCS designs to explore the association between thiazolidinediones and the risks of heart failure and fracture and anticonvulsants and the risk of fracture. Bias was removed when follow-up was sampled both before and after the outcome, or when a case-time-control design was used.
The main purpose of using an active comparator is to reduce confounding by indication or by severity, at least in relation to the contrasts “treated diseased vs. untreated undiseased” or “treated diseased vs. untreated diseased”. It is optimal to use the active comparator in the context of the new user design, whereby comparison is between patients with the same indication initiating different treatments (see The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep 2015;2(4):221-8)). An active comparator should be chosen to represent the counterfactual risk of a given outcome in the absence of the treatment of interest, i.e., it should have a known and positive safety profile with respect to the events of interest, ideally represent the background risk in the diseased but untreated (for example, safety of newer antibiotics in pregnancy in relation to risk of congenital malformations could be compared against that of penicillin, which is not known to be teratogenic). Especially with the newly marketed medicines, no active comparator with ideal comparability may be available, because prescribing newly marketed medicines may be driven to a greater extent by patients prognostic characteristics than prescribing of established medicines (early users may be either sicker or healthier than all patients with the indication). This also applies to comparative effectiveness studies as described in Assessing the comparative effectiveness of newly marketed medications: methodological challenges and implications for drug development (Clin Pharmacol Ther 2011;90(6):777-90) and in Newly marketed medications present unique challenges for nonrandomized comparative effectiveness analyses. (J Comp Eff Res 2012;1(2):109-11). Other challenges include treatment effect heterogeneity as patient characteristics of users evolve over time, and low precision owing to slow drug uptake.
An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol 1976;104(6):609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.
DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res 2009;18(1):67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows that large variation exists with differences in terminology and methods used.
In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf 2012;21 Suppl 2:138–47) it is argued that DRS may have a place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology 2016;27(1):133-42).
Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.
A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol 1999;150(4):327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.
High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiol 2009; 20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol 2011;173(12):1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf 2012;20(8):849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS.
Most cohort studies match patients 1:1 on the propensity score. Increasing the matching ratio may increase precision but also bias. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.
The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf 2011;20(11):1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2014;23(8):802-11). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol 2010; 172(7):843-54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.
Performance of propensity score calibration-–a simulation study (Am J Epidemiol 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.
Although in most situations propensity score models, with the exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in Drug Safety research. Furthermore, assessment of the PS distribution may reveal non-positivity. An important limitation of PS is that it is not directly amenable for case-control studies. A critical assessment of propensity scores is provided in Propensity scores: from naive enthusiasm to intuitive understanding (Stat Methods Med Res 2012;21(3):273-93).
Instrumental variable (IV) analysis is an approach to address uncontrolled confounding in comparative studies. An introduction to instrumental variables for epidemiologists (Int J Epidemiol 2000;29(4):722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. IV analysis in comparative safety and effectiveness research is reviewed in Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010; 19(6):537-54). A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumption could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).
A proposal for reporting instrumental variable analyses has been suggested in Commentary: how to report instrumental variable analyses (suggestions welcome) (Epidemiology 2013;24(3):370-4). In particular the type of treatment effect (average treatment effect/homogeneity condition or local average treatment effect/monotonicity condition) and the testing of critical assumptions for valid IV analyses should be reported. In support of these guidelines, the standardized difference has been proposed to falsify the assumption that confounders are not related to the instrumental variable (Quantitative falsification of instrumental variables assumption using balance measures, Epidemiology 2014;25(5):770-2).
The complexity of the issues associated with confounding by indication, channelling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which was at odds with results from clinical trials. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2016;25 Suppl 1:132-41).
Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010;19(6):537–54) is a practical guidance on IV analyses in pharmacoepidemiology. Instrumental variable methods for causal inference (Stat Med 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.
An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf 2014; 2014;23(2):165-77) demonstrated that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.
Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options.
Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) from periods before and after initiation of a drug exposure, as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf 2008;17(7):671-685). For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.
Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2015(5);24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided.
Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation.
Beyond the G-estimation and the Marginal Structural Model (MSM) described below, traditional and efficient approaches to deal with time dependent variables should be considered in the design of the study, such as nested case control studies with assessment of time varying exposure windows.
G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure.
The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11(5):550-60) introduces a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding.
MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11(5):561-70).
Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol 2003;158(7):687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta2-agonist use and the risk of coronary heart disease (Eur J Epidemiol 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression.
One may test the validity of putative causal associations by using control exposures or outcomes. Well-chosen positive and negative controls help convince investigator that the data at hand correctly detect existing associations or correctly demonstrate lack of association when none is expected. Positive controls turning out as negative and negative as positive may signal presence of a bias, as illustrated in a study demonstrating health adherer bias by showing that adherence to statins was associated with decreased risks of biologically implausible outcomes (Statin adherence and risk of accidents: a cautionary tale, Circulation 2009;119(15):2051-7). The general principle, with additional examples, is described in Control Outcomes and Exposures for Improving Internal Validity of Nonrandomized Studies (Health Serv Res 2015;50(5):1432-51).
Selecting drug-event combinations as reliable controls poses a challenge: it is difficult to establish for negative controls proof of absence of an association, and it is still more problematic to select positive controls because it is desirable not only to establish an association but also an accurate estimate of the effect size. This has led to attempts to establish libraries of controls that can be used to characterise the performance of different observational datasets in detecting various types of association using a number of different study designs. Although this kind of controls may be questioned according to Evidence of Misclassification of Drug-Event Associations Classified as Gold Standard 'Negative Controls' by the Observational Medical Outcomes Partnership (OMOP) (Drug Saf 2016;39(5):421-32), the approach of calibrating the performance of epidemiological methods prior to performing a study holds the promise of providing a trustworthy framework for interpretation of the results, as shown by Interpreting observational studies: Why empirical calibration is needed to correct p-values (Stat Med. 2014;33(2):209-18), Robust empirical calibration of p-values using observational data (Stat Med 2016;35(22):3883-8) and Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data (Proc Natl Acad Sci USA 2018;115 (11): 571-7).
Triangulation is not a separate methodological approach, but rather a framework, formally described in Triangulation in aetiological epidemiology (Int J Epidemiol 2016;45(6):1866-86). Triangulation is defined as “the practice of obtaining more reliable answers to research questions through integrating results from several different approaches, where each approach has different key sources of potential bias that are unrelated to each other.” In some ways, the paper formalises approaches already used in many nonrandomised pharmacoepidemiologic studies, including control exposures and outcomes, sensitivity analyses, comparing results from different population and different study designs – all within the same study and while explicitly specifying the direction of bias in each approach. Triangulation was used (without using the explicit term) in Associations of maternal antidepressant use during the first trimester of pregnancy with preterm birth, small for gestational age, autism spectrum disorder, and attention-deficit/hyperactivity disorder in offspring (JAMA 2017;317(15):1553-62), whereby, within the same study, the authors used negative controls (paternal exposure to antidepressants), and assess the association using different study design and study population (sibling design).
Effect measure modification and interaction are often encountered in epidemiological research and it is important to recognize their occurrence. The difference between these terms is rather subtle and has been described in On the distinction between interaction and effect modification (Epidemiology 2009;20(6):863–71). Effect measure modification occurs when the measure of an effect changes over values of some other variable (which does not necessarily need to be a causal factor). Interaction occurs when two exposures contribute to the causal effect of interest, and they are both causal factors. Interaction is generally studied in order to clarify etiology while effect modification is used to identify populations that are particularly susceptible to the exposure of interest.
To check the presence of an effect measure modifier, one can stratify the study population by a certain variable, e.g. by gender, and compare the effects in these subgroups. It is recommended to perform a formal statistical test to assess if there are statistically significant differences between subgroups for the effects (see CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials, J Clin Epidemiol 2010;63(8):e1-37) and Interaction revisited: the difference between two estimates (BMJ 2003;326(7382):219). The study report should explain which method was used to examine these differences and specify which subgroup analyses were predefined in the study protocol and which ones were performed while analysing the data (Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Epidemiology 2007;18(6):805-35).
The presence of effect measure modification depends on which measure is used in the study (absolute or relative) and can be measured in two ways: on an additive scale (based on risk differences [RD]), or on a multiplicative scale (based on relative risks [RR]). From the perspective of public health and clinical decision making, the additive scale is usually considered the most appropriate. An example of potential effect modifier in studies assessing the risk of occurrence of events associated with recent drug use is the past use of the same drug. This is shown in Evidence of the depletion of susceptibles effect in non-experimental pharmacoepidemiologic research (J Clin Epidemiol 1994;47(7):731-7) in the context of a hospital-based case-control study on NSAIDs and the risk of upper gastrointestinal bleeding.
For the evaluation of interaction, the standard measure is the relative excess risk due to interaction (RERI), as explained in the textbook Modern Epidemiology (K. Rothman, S. Greenland, T. Lash. 3rd Edition, Lippincott Williams & Wilkins, 2008). Other measures of interaction include the attributable proportion (A) and the synergy index (S). According to Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: results from a systematic review and simulation study (J Clin Epidemiol 2014; 67(7):821-9), with sufficient sample size, most interaction tests perform similarly with regard to type 1 error rates and power.
Due to confusion about these terms, is important that effect measure modification and interaction analysis are presented in a way that is easy to interpret and allows readers to reproduce the analysis. For recommendations regarding reporting, Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration (Epidemiology 2007;18(6):805-35) and Recommendations for presenting analyses of effect modification and interaction (Int J Epidemiol 2012;41(2):514-20) are useful resources and they recommend to present the results as follows:
Separate effects (rate ratios, odds ratios or risk differences, with confidence intervals) of the exposure of interest (e.g. drug), of the effect modifier (e.g. gender) and of their joint effect using one single reference category (preferably the stratum with the lowest risk of the outcome) as suggested in Estimating measures of interaction on an additive scale for preventive exposures (Eur J Epidemiol 2011;26(6):433-8), as this gives enough information to the reader to calculate effect modification on an additive and multiplicative scale;
Effects of the exposure within strata of the potential effect modifier;
Measures of effect modification on both additive (e.g. RERI) and multiplicative (e.g. S) scales including confidence intervals;
List of the confounders for which the association between exposure and outcome was adjusted for.
Ecological analyses are not hypothesis testing but hypothesis generating studies. As illustrated in Control without separate controls: evaluation of vaccine safety using case-only methods (Vaccine 2004; 22(15-16):2064-70), they assume that a strong correlation between the trend in an indicator of an exposure (vaccine coverage in this example) and the trend in incidence of a disease (trends calculated over time or across geographical regions) is consistent with a causal relationship. Such comparisons at the population level may only generate hypotheses as they do not allow controlling for time-related confounding variables, such as age and seasonal factors. Moreover, they do not establish that the vaccine effect occurred in the vaccinated individuals.
Case-population studies are a form of ecological studies where cases are compared to an aggregated comparator consisting of population data. The case-population study design: an analysis of its application in pharmacovigilance (Drug Saf 2011;34(10):861-8) explains its design and its application in pharmacovigilance for signal generation and drug surveillance. The design is also explained in Chapter 2: Study designs in drug utilization research of the textbook Drug Utilization Research - Methods and Applications (M Elseviers, B Wettermark, AB Almarsdóttir, et al. Editors. Wiley Blackwell, 2016). An example is a multinational case-population study aiming to estimate population rates of a suspected adverse event using national sales data (see Transplantation for Acute Liver Failure in Patients Exposed to NSAIDs or Paracetamol (Acetaminophen, Drug Saf 2013;36(2):135–44). Based on the same study, Choice of the denominator in case population studies: event rates for registration for liver transplantation after exposure to NSAIDs in the SALT study in France (Pharmacoepidemiol Drug Saf 2013;22(2):160-7) compared sales data and healthcare insurance data as denominators to estimate population exposure and found large differences in the event rates. Choosing the wrong denominator in case population studies might generate erroneous results. The choice of the right denominator depends not only on a valid datasource but will also depend on the hazard function of the adverse event.
A pragmatic attitude towards case-population studies is recommended: in situations where nation-wide or region-wide EHR are available and allow assessing the outcomes and confounders with sufficient validity, a case-population approach is neither necessary nor desirable, as one can perform a population-based cohort or case-control study with adequate control for confounding. In situations where outcomes are difficult to ascertain in EHRs or where such databases do not exist, the case-population design might give an approximation of the absolute and relative risk when both events and exposures are rare. This is limited by the ecological nature of the reference data that restricts the ability to control for confounding.
RCTs are considered the gold standard for demonstrating the efficacy of medicinal products and for obtaining an initial estimate of the risk of adverse outcomes. However, as is well understood, these data are often not necessarily indicative of the benefits, risks or comparative effectiveness of an intervention when used in clinical practice populations. The IMI GetReal Glossary defines a pragmatic clinical trial as ‘a study comparing several health interventions among a randomised, diverse population representing clinical practice, and measuring a broad range of health outcomes’. Pragmatic clinical trial are focused on evaluating benefits and risks of treatments in patient populations and settings that are more representative of routine clinical practice. To ensure generalisability, pragmatic trials should represent the patients to whom the treatment will be applied, for instance, inclusion criteria would be broad (e.g. allowing co-morbidity, co-medication, wider age range), the follow-up would be minimized and allow for treatment switching etc. Monitoring safety in a phase III real-world effectiveness trial: use of novel methodology in the Salford Lung Study (Pharmacoepidemiol Drug Saf 2017;26(3):344-352) describes the model of a phase III pragmatic clinical trials where patients were enrolled through primary care practices using minimal exclusion criteria and without extensive diagnostic testing and where potential safety events were captured through patients’ electronic health records and in turn triggered review by the specialist safety team.
Pragmatic explanatory continuum summary (PRECIS): a tool to help trial designers (CMAJ 2009; 180(10): E45-E57) is a tool to support pragmatic trial designs and helps define and evaluate the degree of pragmatism. The PRECIS tool has been further refined and now comprises nine domains each scored on a 5 point Likert scale ranging from very explanatory to very pragmatic with an exclusive focus on the issue of applicability (The PRECIS-2 tool: designing trials that are fit for purpose. BMJ 2015;350: h2147). A checklist and additional guidance is also provided in Improving the reporting of pragmatic trials: an extension of the CONSORT statement (BMJ 2008; 337 (a2390): 1-8).
Large simple trials are pragmatic clinical trials with minimal data collection protocols that are narrowly focused on clearly defined outcomes important to patients as well as clinicians. Their large sample size provides adequate statistical power to detect even small differences in effects. Additionally, large simple trials include a follow-up time that mimics routine clinical practice.
Large simple trials are particularly suited when an adverse event is very rare or has a delayed latency (with a large expected attrition rate), when the population exposed to the risk is heterogeneous (e.g. different indications and age groups), when several risks need to be assessed in the same trial or when many confounding factors need to be balanced between treatment groups. In these circumstances, the cost and complexity of a traditional RCT may outweigh its advantages and large simple trials can help keep the volume and complexity of data collection to a minimum.
Outcomes that are simple and objective can also be measured from the routine process of care using epidemiological follow-up methods, for example by using questionnaires or hospital discharge records. Large simple trial methodology is discussed in Chapters 36 and 37 of the book Pharmacoepidemiology (Strom BL, Kimmel SE, Hennessy S. 5th Edition, Wiley, 2012), which includes a list of conditions appropriate for their conduct and a list of conditions which make them feasible. Examples of published large simple trials are An assessment of the safety of paediatric ibuprofen: a practitioner based randomised clinical trial (JAMA 1995;279:929-33) and Comparative mortality associated with ziprasidone and olanzapine in real-world use among 18,154 patients with schizophrenia: The Zodiac Observational Study of Cardiac Outcomes (ZODIAC) (Am J Psychiatry 2011;168(2):193-201).
Note that the use of the term ‘simple’ in the expression ‘Large simple trials’ refers to data structure and not to data collection. It is used in relation to situations in which a small number of outcomes are measured. The term may therefore not adequately reflect the complexity of the studies undertaken.
Randomised database studies can be considered a special form of a large simple trial where patients included in the trial are enrolled in a healthcare system with electronic records. Eligible patients may be identified and flagged automatically by the software, with the advantage of allowing comparison of included and non-included patients. Database screening or record linkage can be used to detect and measure outcomes of interest otherwise assessed through the normal process of care. Patient recruitment, informed consent and proper documentation of patient information are hurdles that still need to be addressed in accordance with the applicable legislation for RCTs. Randomised database studies attempt to combine the advantages of randomisation and observational database studies. These and other aspects of randomised database studies are discussed in The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials (Health Technol Assess. 2014;18(43):1-146) which illustrates the practical implementation of randomised studies in general practice databases.
There are few published examples of randomised database studies, but this design could become more common in the near future with the increasing computerisation of medical records. Pragmatic randomised trials using routine electronic health records: putting them to the test (BMJ 2012;344:e55) describes a project to implement randomised trials in the everyday clinical work of general practitioners, comparing treatments that are already in common use, and using routinely collected electronic healthcare records both to identify participants and to gather results.
A particular form of randomised databases studies is the registry-based randomised trial, which uses an existing registry as a platform for the identification of cases, their randomisation and their follow-up. The editorial The randomized registry trial - the next disruptive technology in clinical research? (N Engl J Med 2013; 369(17):1579-1581 ) introduces the concept. This hybrid design tries to achieve both internal and external validity by using a robust design (a RCT) in a data source with higher generalisability (registries). Other examples are the TASTE trial that followed patients in the long-term using data from a Scandinavian registry (Thrombus aspiration during ST-segment elevation myocardial infarction. N. Engl J Med. 2013;369(17):1587-97) and A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial (JACC Cardiovasc Interv. 2014;7(8):857-67). A potential limitation of randomised registry trials is that routine collection of outcomes data is needed for the trial, such as information on surrogate markers and adverse events.
Identification and integration of evidence derived from results from several studies with the same or similar research objective can extend our understanding of the research question. A systematic literature review aims to collect in a systematic and explicit manner all empirical evidence that fits pre-specified eligibility criteria to answer a specific research question and to critically appraise relevant results. A meta-analysis involves the use of statistical techniques to integrate and summarise the results of identified studies. The focus of this activity may be to learn from the diversity of designs, results and associated gaps in knowledge as well as to obtain overall risk estimates. An example of a systematic review and meta-analysis of results of individual studies with potentially different design is given in Variability in risk of gastrointestinal complications with individual NSAIDs: results of a collaborative meta-analysis (BMJ 1996;312(7046):1563-6), which compared the relative risks of serious gastrointestinal complications reported with individual NSAIDs by conducting a systematic review of twelve hospital and community based case-control and cohort studies, and found a relation between use of the drugs and admission to hospital for haemorrhage or perforation.
Systematic review and meta-analysis of observational studies and other epidemiological sources are becoming as common as those of randomised clinical trials (RCTs). Challenges in systematic reviews that assess treatment harms (Ann Intern Med 2005;142:1090-9) explains the different reasons why both are important in providing relevant information and knowledge for pharmacovigilance.
A detailed guidance on the methodological conduct of systematic reviews and meta-analysis is reported in Annex 1 of this guide. This guidance includes links to other relevant resources.
It should be noted that meta-analysis, even of RCTs, shares characteristics with observational research as subjective criteria are often involved in the selection of studies to include. Careful planning in design of a meta-analysis and pre-specification of selection criteria, outcomes and analytical methods before review of any study results may thus contribute to the confidence placed in the results. A further useful reference is the CIOMS Working Group X Guideline on Evidence Synthesis and Meta-Analysis for Drug Safety (Geneva 2016).
A general overview of methods for signal detection and recommendations for their application are provided in the report of the CIOMS Working Group VIII Practical aspects of signal detection in pharmacovigilance and empirical results on various aspects of signal detection obtained from the IMI PROTECT project have been summarised in Good signal detection practices: evidence from IMI PROTECT (Drug Saf 2016;39:469-90).
The EU Guideline on good pharmacovigilance practices (GVP) Module IX - Signal Management defines signal management as the set of activities performed to determine whether, based on an examination of individual case safety reports (ICSRs), aggregated data from active surveillance systems or studies, literature information or other data sources, there are new risks associated with an active substance or a medicinal product or whether risks have changed. Signal management covers all steps from detecting signals (signal detection), through their validation and confirmation, analysis, prioritisation and assessment to recommending action, as well as the tracking of the steps taken and of any recommendations made.
The FDA’s Guidance for Industry-Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment provides best practice for documenting, assessing and reporting individual case safety reports and case series and for identifying, evaluating, investigating and interpreting safety signals, including recommendations on data mining techniques and use of pharmacoepidemiological studies.
Quantitative analysis of spontaneous adverse drug reaction reports is routinely used in Drug Safety research. Quantitative signal detection using spontaneous ADR reporting (Pharmacoepidemiol Drug Saf 2009;18:427-36) describes the core concepts behind the most common methods, the proportional reporting ratio (PRR), reporting odds ratio (ROR), information component (IC) and empirical Bayes geometric mean (EBGM). The authors also discuss the role of Bayesian shrinkage in screening spontaneous reports and the importance of changes over time in screening the properties of the measures. Additionally, they discuss major areas of controversy (such as stratification and evaluation and implementation of methods) and give some suggestions as to where emerging research is likely to lead. Data mining for signals in spontaneous reporting databases: proceed with caution (Pharmacoepidemiol Drug Saf 2007;16(4):359–65) reviews data mining methodologies and their limitations and provides useful points to consider before incorporating data mining as a routine component of any pharmacovigilance program.
The revised guidance on Routine signal detection methods in EudraVigilance describes methods (statistical and clinical information based) for screening adverse reactions and used by the European Medicines Agency, national competent authorities and Marketing Authorisation Holders. For the methods recommended, it addresses elements of their interpretation, their potential advantages and limitations and the evidence behind. Areas of uncertainty, that require resolution before firm recommendations can be made, are also mentioned.
Methods such as multiple logistic regression (that may use propensity score-adjustment) have the theoretical capability to reduce masking and confounding by co-medication and underlying disease. The letter Logistic regression in signal detection: another piece added to the puzzle (Clin Pharmacol Ther 2013;94 (3):312) highlights the variability of results obtained in different studies based on this method and the daunting computational task it requires. More work is needed on its value for pharmacovigilance in the real world setting.
A more recent proposal involves a broadening of the basis for computational screening of individual case safety reports, by considering multiple aspects of the strength of evidence in a predictive model. This approach combines disproportionality analysis with features such as the number of well-documented reports, the number of recent reports and geographical spread of the case series (Improved statistical signal detection in pharmacovigilance by combining multiple strength-of-evidence aspects in vigiRank. Drug Saf 2014;37(8):617–28). In a similar spirit, logistic regression has been proposed to combine a disproportionality measure with a measure of unexpectedness for the time-to-onset distribution (Use of logistic regression to combine two causality criteria for signal detection in vaccine spontaneous report data, Drug Saf 2014;37(12):1047-57).
Disproportionality methods are usually calculated on the cumulative data and therefore do not provide a direct insight into temporal changes in frequency of reports. Methodologies to monitor changes in the frequency of reporting over time have been developed with the focus to enhance pharmacovigilance when databases are small, when drugs have established safety profiles and/or when product quality defects, medication errors and cases of abuse or misuse are of concern.
Automated method for detecting increases in frequency of spontaneous adverse event reports over time (J Biopharm Stat. 2013; 23(1):161-77) presents a regression method with both smooth trend and seasonal components, while in An algorithm to detect unexpected increases in frequency of reports of adverse events in EudraVigilance (Pharmacoepidemiol Drug Saf 2018;27(1):38-45) a model based on a negative binomial time-series regression model was tested on thirteen historical concerns. Additionally, a modification of the Information Component to screen for spatial-temporal disproportionality is described in Using VigiBase to Identify Substandard Medicines: Detection Capacity and Key Prerequisites (Drug Saf 2015; 38(4): 373–382). Despite the promising results of these methods, and even if theoretically they seem appealing, limited work has been performed to assess their effectiveness. Thus these methods might be implemented with ongoing quality control measures to ensure acceptable performance.
As understanding increases regarding the mechanisms at a molecular level that are involved in adverse effects of drugs it would be expected that this information will inform efforts to predict and detect Drug Safety problems. Such modeling is currently at an early stage, as presented in Data-driven prediction of drug effects and interactions (Sci Transl Med. 2012 14;4(125):125ra31), but should be a major focus of Drug Safety research activity. An example of an application of this concept is illustrated in the paper Cheminformatics-aided pharmacovigilance: application to Stevens-Johnson Syndrome (J Am Med Inform Assoc 2016; 23(5): 968–78) where the authors apply a Quantitative Structure-Activity Relationship (QSAR) model to predict the drugs associated with Stevens Johnson syndrome in a pharmacovigilance database.
The role of data mining in pharmacovigilance (Expert Opin Drug Saf 2005;4(5):929-48) explains how signal detection algorithms work and addresses questions regarding their validation, comparative performance, limitations and potential for use and misuse in pharmacovigilance.
An empirical evaluation of several disproportionality methods in a number of different spontaneous reporting databases is given in Comparison of statistical detection methods within and across spontaneous reporting databases (Drug Saf 2015; 38(6); 577-87).
Performance of pharmacovigilance signal detection algorithms for the FDA adverse event reporting system (Clin Pharmacol Ther 2013;93(6):539-46) describes the performance of signal-detection algorithms for spontaneous reports in the US FDA adverse event reporting system against a benchmark constructed by the Observational Medical Outcomes Partnership OMOP. It concludes that logistic regression performs better than traditional disproportionality analysis. Other studies have addressed similar or related questions, for examples Large-scale regression-based pattern discovery: The example of screening the WHO global Drug Safety database (Stat. Anal. Data Min 2010;3(4), 197–208), Are all quantitative postmarketing signal detection methods equal? Performance characteristics of logistic regression and Multi-item Gamma Poisson Shrinker (Pharmacoepidemiol. Drug Saf 2012; 21(6):622–630 and Data-driven prediction of drug effects and interactions (Sci. Transl. Med. 2012; 4(125):125ra31).
Many statistical signal detection algorithms disregard the underlying diversity and give equal weight to reports on all patients when computing the expected number of reports for a drug-event pair. This may render them vulnerable to confounding and distortions due to effect modification, and could result in true signals being masked or false associations being flagged as potential signals. Stratification and/or subgroup analyses might address these issues, and whereas stratification is implemented in some standard software packages, routine use of subgroup analyses is less common. Performance of stratified and subgrouped disproportionality analyses in spontaneous databases (Drug Saf 2016; 39(4):355-364) performed a comparison across a range of spontaneous report databases and covariates and found subgroup analyses to improve first pass signal detection, whereas stratification did not; subgroup analyses by patient age and country of origin were found to bring greatest value.
Masking is a statistical issue by which true signals of disproportionate reporting are hidden by the presence of other products in the database. While it is not currently perfectly understood, publications have described methods assessing the extent and impact of the masking effect of measures of disproportionality. They include A conceptual approach to the masking effect of measures of disproportionality (Pharmacoepidemiol Drug Saf 2014;23(2):208-17), with an application described in Assessing the extent and impact of the masking effect of disproportionality analyses on two spontaneous reporting systems databases (Pharmacoepidemiol Drug Saf 2014;23(2):195-207), Outlier removal to uncover patterns in adverse drug reaction surveillance - a simple unmasking strategy (Pharmacoepidemiol Drug Saf 2013;22(10):1119-29) and A potential event-competition bias in safety signal detection: results from a spontaneous reporting research database in France (Drug Saf 2013;36(7):565-72). The value of these methods in practice needs to be further investigated.
A time-consuming step in signal detection of adverse reactions is the determination of whether an effect is already recorded in the product information. A database which can be searched for this information allows filtering or flagging reaction monitoring reports for signals related to unlisted reactions, thus improving considerably the efficiency of the signal detection process by restricting attention to those drugs and adverse event not already considered causally related. In research, it permits an evaluation of the effect of background restriction on the performance of statistical signal detection. An example of such database is the PROTECT Database of adverse drug reactions (EU SPC ADR database), a structured Excel database of all adverse drug reactions (ADRs) listed in Chapter 4.8 of the Summary of Product Characteristics (SPC) of medicinal products authorised in the European Union (EU) according to the centralised procedure, based exclusively on the Medical Dictionary for Regulatory Activities (MedDRA) terminology.
Other large observational databases such as claims and electronic medical records databases are potentially useful as part of a larger signal detection and refinement strategy. Modern methods of pharmacovigilance: detecting adverse effects of drugs (Clin Med 2009;9(5):486-9) describes the strengths and weaknesses of different data sources for signal detection (spontaneous reports, electronic patient records and cohort-event monitoring). A number of studies have considered the use of observational data in electronic systems that complement existing methods of safety surveillance e.g. the PROTECT, OHDSI and Sentinel projects.
Assessment of the impact of pharmacovigilance actions at the population level is an area currently under-investigated but with increasing importance for regulators. Impact research identifies the net impact of a regulatory intervention by measuring both the intended outcomes and the unintended consequences of a regulatory intervention, such as stopping a useful medication or switching to alternatives. A detailed guidance on the methodological conduct of impact studies is provided in Annex 2 of this Guide, together with a comprehensive reference list.
Although it uses existing datasources and methods, the area of impact research has some distinctive characteristics that are worthwhile discussing.
To measure the impact of pharmacovigilance activities, process indicators or outcome indicators can be used depending on the type of intervention, target population, drug or disease characteristics. Determining and measuring the right outcomes can be challenging. Itmay be further complicated by unavailability of data and may therefore require use of surrogate outcomes. Data sources for the analysis include both primary and secondary data, the latter being used more frequently as they reflect routine clinical practice (real world population). However, secondary data is often originally collected for other purposes and as such present limitations, especially in terms of missing relevant data.
If the date or time period of the intervention is known, a before/after time series is a design frequently used allowing to analyse changes of trends in incidence or prevalence of an outcome before and after the intervention occurred. Trend changes may be affected by simultaneously occurring interventions or events and the use of comparator groups that did not receive the intervention may facilitate the interpretation of any associations found.
The analytical methods will depend on the study design and type of data collection. Interrupted time series (ITS) regression is a strong analytical tool for before/after time series, especially if autocorrelation and adjusting for seasonality are taken into account, and the time point (or period) of the intervention is known. For adequate power, sufficient time points before and after the intervention are required. Joinpoint regression models calculating time points of trend line changes offer an alternative if the date of the intervention is unknown.
Specific analytical approaches are needed to measure unintended effects of pharmacovigilance activities which may not be expected at the design stage, for example switching to alternative medicines following product withdrawal or restriction, and determine the net attributable impact on patient outcomes.
Future challenges include the identification of long-term consequences of regulatory actions and the definition of thresholds for successful risk minimisation activities.