Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


Chapter 6: Methods to address bias and confounding


6.1. Bias

      6.1.1. Selection bias

      6.1.2. Information bias

      6.1.3. Time-related bias

6.2. Confounding

      6.2.1. Confounding by indication

      6.2.2. Unmeasured confounding

      6.2.3. Methods to address confounding

6.3. Missing data

      6.3.1. Impact of missing data

      6.3.2. Patterns of missing data

      6.3.3. Handling missing data

      6.3.4. Statistical software

6.4. Triangulation



6.1. Bias


6.1.1. Selection bias


Selection bias means the selective inclusion into the study of subjects who are not representative of the exposure or outcome pattern in the source population. Of note that selection bias needs to produce distortion in exposure-outcome relation. Lack of representativeness of exposure or outcome pattern alone is not sufficient to cause selection bias. Examples of common selection bias are referral bias, self-selection bias and prevalence bias.


Prevalence bias may occur when prevalent drug users are included in an observational study, i.e. patients already taking a therapy for some time before study follow-up began. This can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if the risk varies with time, as seen in safety studies with unwanted exclusion from a safety assessment of persons discontinuing treatments following early adverse reactions (‘depletion of susceptibles’). An illustrative example is the comparison between users of third and older generations of oral contraceptives regarding the risk of venous thrombosis where the association for the third generation was initially overestimated due to the heathy-users bias in persons taking older contraceptives (see The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update 1999;5(6):707-20). Secondly, covariates for drug use at study entry are often influenced by the previous intake of the drug.


The article Collider bias undermines our understanding of COVID-19 disease risk and severity (Nat Commun. 2020;11(1):5749) describes a selection bias where a variable (a collider) is influenced by two other variables, for example when an exposure (being a healthcare worker) and an outcome (severity of COVID-19 infection) both affect the variable determining the likelihood of being sampled (presence of PCR testing or hospitalisation). A bias would arise when the analysis includes only those people who have experienced an event such as hospitalisation with COVID-19, been tested for active infection or who have volunteered their participation. Among hospitalised patients, the relationship between any exposure that relate to hospitalisation and the severity of infection would be distorted compared to the general population. The article proposes methods for detecting and minimising the effects of collider bias. Vaccine side-effects and SARS-CoV-2 infection after vaccination in users of the COVID Symptom Study app in the UK: a prospective observational study (Lancet Infect Dis. 2021;S1473) discusses that collider bias would occur in the study if both vaccination status and COVID-19 positivity influenced the probability of participation in the study. However, it is believed that collider bias was unlikely to underlie the reduction in infections following vaccination seen in the data given that strong reductions in COVID-19 hospitalisations after vaccination were observed in other nationwide studies.


Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;kwab028) illustrates selection bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection and how to address them at the analysis and design stages.


Mitigating selection bias at the analysis stage


Once they have occurred, selection biases cannot be removed at the analysis stage if the factors responsible for the selection are not known or not measured. In some circumstances, it may be possible to restrict the study population by including only groups where the selection did not operate. For example, a prevalence bias may be removed by restricting the analysis to incident drug users, i.e. patients enter the study cohort only at the start of the first course of the treatment of interest (or of different treatment groups) during the study period. Consequences may include reduced precision of estimates due to lower sample size and likely reduction in the number of patients with long-term exposure. In circumstances where the factors influencing the selection are known and have been accurately measured, they can be treated as confounding factors and adjusted for at the analysis stage.


Mitigating selection bias at the design stage


The impact of selection biases should therefore be best avoided or minimised with proper consideration at study design. The new user (incident user) design helps mitigate selection bias by alleviate healthy user bias for preventive treatments in some circumstances (see Healthy User and Related Biases in Observational Studies of Preventive Interventions: A Primer for Physicians. J Gen Intern Med 2011;26(5):546-50). The article Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol. 2003;158 (9):915–20) defines new user designs in cohort and case-control settings. The articles The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep. 2015;2(4):221-28) and New-user designs with conditional propensity scores: a unified complement to the traditional active comparator new-user approach (Pharmacoepidemiol Drug Saf. 2017;26(4):469-7) extend the discussion to studies with active comparators. One should be aware of the difference between a new user (which requires absence of prior use of a given drug/drug class during a prespecified washout period) and a treatment-naïve user (which requires absence of prior treatment for a given indication). A treatment-naïve status may not be ascertainable in left-truncated data.


The active comparator new user design (see Chapter 5.4.5) would ideally compare two treatments that are marketed contemporaneously. However, a more common situation is where a recently marketed drug is compared with an older established alternative. For such situations, the article Prevalent new-user cohort designs for comparative drug effect studies by time-conditional propensity scores (Pharmacoepidemiol Drug Saf. 2017;26(4):459-68) introduces a cohort design allowing identification of matched subjects using the comparator drug at the same point in the course of disease as the (newly marketed) drug of interest. The design utilises time-based and prescription-based exposure sets to compute time-dependent propensity scores of initiating the new drug.


Observational studies of treatment effectiveness: worthwhile or worthless? (Clin Epidemiol. 2018;11:35-42) discuss how researchers can mitigate the risk of bias in the cohort design and present a case of the comparative effectiveness of two antidiabetic treatments using data collected during routine clinical practice.


The use of case-only designs can also reduce selection bias if the statistical assumptions of the method are fulfilled (see Chapters 5.2.3 and 5.4.3).


6.1.2. Information bias


Information bias (misclassification) arises when incorrect information about either exposure or outcome or any covariates is collected in the study or if variables are incorrectly categorized. Different factors may cause information bias. Chapter 5.1. describes errors in definition, measurement and classification of variables and how to address them. Errors may also occur in the study design and method for data collection. Examples are the recall bias occurring in case-controls studies where cases and controls can have different recall of their past exposures, as well as the protopathic bias and surveillance or detection bias which are described below.


Protopathic bias


Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias thus reflects a reversal of cause and effect (see Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65(22):2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies. It may be handled by including a time-lag, (i.e. by disregarding all exposure during a specified period of time before the index date).

Protopathic bias has also been described as a selection bias and it should not be confused with confounding by indication (see Confounding by Indication: An Example of Variation in the Use of Epidemiologic Terminology, Am J Epidemiol. 1999;149(11):981-3).


Surveillance bias (or detection bias)


Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or of an associated symptom. For example, post-menopausal exposure to estrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between estrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).


This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. These issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA 2011;305(23):2462-3).


6.1.3. Time-related bias


Immortal time bias


Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur. (K. Rothman, S. Greenland, T. Lash. Modern Epidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008).


Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf. 2007;16(3):241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this bias.


Immortal Time Bias in Pharmacoepidemiology (Am J Epidemiol 2008;167(4):492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions and illustrates it with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time. It is asserted that immortal time bias arises by conditioning on future exposure and that it can be avoided by analysing the data as if the exposures and outcomes were included as they developed, without ever looking into the future. Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;kwab028) illustrates immortal time bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection.


Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162(10):1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects are excluded from the analysis and the study allows effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ. 2010; 340:b5087) describes how immortal time in observational studies can bias the results in favor of the treatment group and how they can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias in pharmacoepidemiology’ (Am J Epidemiol 2009;170(5):667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.


Other forms of time-related bias


In many database studies, drugs administered during hospitalisations are unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure, especially when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol. 2008;168(3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.


In case-control studies assessing chronic diseases with multiple hospitalisations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias, as shown in Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol. 2008;168(3):329-35).


In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g. diabetes), which can induce confounding of the association with an outcome (e.g. cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).


Time-related biases in pharmacoepidemiology (Drug Saf. 2020;29(9):1101-1110) further discusses several time-related biases and illustrates their impact on the effects of different COPD treatments on lung cancer, acute myocardial infarction and mortality outcomes, in studies using electronic healthcare databases. Protopathic, latency, immortal time, time-window, depletion of susceptibles, and immeasurable time biases were shown to significantly impact the effects of the study drugs on the outcomes.


6.2. Confounding


Confounding occurs when the estimate of measure of association is distorted by the presence of another risk factor. For a variable to be a confounder, it must be associated with both the exposure and the outcome, without being in the causal pathway.


6.2.1. Confounding by indication


Confounding by indication refers to a determinant of the outcome parameter that is present in people at perceived high risk or poor prognosis and is an indication for intervention. This means that differences in care between the exposed and non-exposed, for example, may partly originate from differences in indication for medical intervention such as the presence of specific risk factors for health problems. Another name for this type of confounding is ‘channeling’. Confounding by severity is a type of confounding by indication, where not only the disease but its severity acts as confounder (see Confounding by Indication: An Example of Variation in the Use of Epidemiologic Terminology, Am J Epidemiol. 1999;149(11):981-3).


This type of confounding has frequently been reported in studies evaluating the efficacy of pharmaceutical interventions and is almost always encountered in various extents in pharmacoepidemiological studies. A good example can be found in Confounding and indication for treatment in evaluation of drug treatment for hypertension (BMJ. 1997;315:1151-4).


With the more recent application of pharmacoepidemiological methods to assess effectiveness, confounding by indication is a greater challenge and the article Approaches to combat with confounding by indication in observational studies of intended drug effects (Pharmacoepidemiol Drug Saf. 2003;12(7):551-8) focusses on its possible reduction in studies of intended effects. An extensive review of these and other methodological approaches discussing their strengths and limitations is discussed in Methods to assess intended effects of drug treatment in observational studies are reviewed (J Clin Epidemiol. 2004;57(12):1223-31).


6.2.2. Unmeasured confounding


Complete adjustment for confounders would require detailed information on clinical parameters, lifestyle or over-the-counter medications, which are often not measured in electronic healthcare records, causing residual confounding bias. Using directed acyclic graphs to detect limitations of traditional regression in longitudinal studies (Int J Public Health 2010;55(6):701-3) reviews confounding and intermediate effects in longitudinal data and introduces causal graphs to understand the relationships between the variables in an epidemiological study.


Unmeasured confounding can be adjusted for only through randomisation. When this is not possible, as most often in pharmacoepidemiological studies, the potential impact of residual confounding on the results should be estimated and considered in the discussion.


Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics (Pharmacoepidemiol Drug Saf. 2006;15(5):291-303) provides a systematic approach to sensitivity analyses to investigate the impact of residual confounding in pharmacoepidemiological studies that use healthcare utilisation databases. In this article, four basic approaches to sensitivity analysis were identified: (1) sensitivity analyses based on an array of informed assumptions; (2) analyses to identify the strength of residual confounding that would be necessary to explain an observed drug-outcome association; (3) external adjustment of a drug-outcome association given additional information on single binary confounders from survey data using algebraic solutions; (4) external adjustment considering the joint distribution of multiple confounders of any distribution from external sources of information using propensity score calibration. The paper concludes that sensitivity analyses and external adjustments can improve our understanding of the effects of drugs in epidemiological database studies. With the availability of easy-to-apply spreadsheets (e.g. at, sensitivity analyses should be used more frequently, substituting qualitative discussions of residual confounding.


The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study (Am J Epidemiol. 2007;166(6):646–55) considers the extent and patterns of bias in estimates of exposure-outcome associations that can result from residual or unmeasured confounding, when there is no true association between the exposure and the outcome. Another important finding of this study was that when confounding factors (measured or unmeasured) are interrelated (e.g. in situations of confounding by indication), adjustment for a few factors can almost completely eliminate confounding.


6.2.3. Methods to address confounding


Methods to address confounding include case-only designs (see Chapters 5.2.3 and 5.4.3) and use of an active comparator (see Chapter 5.4.5). Other methods are detailed hereafter. Disease risk scores


An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol. 1976;104(6):609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.


DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res. 2009;18(1):67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf. 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows that large variation exists with differences in terminology and methods used.


In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf. 2012;21 Suppl 2:138–47), it is argued that DRS may have a place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology 2016;27(1):133-42). Propensity scores


Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.


A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol. 1999;150(4):327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.


The estimated EPS summarises all measured confounders in a single variable and thus can be used in the analysis, as any other confounder, for matching, stratification, weighting or as a covariate in a regression model to adjust for the measured confounding. A description of these methods can be found in the following articles: An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies (Multivariate Behav Res. 2011;46(3):399-424), Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality (Multivariate Behav Res. 2011;46(1):119-51) and Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies (Stat Med. 2015;34(28):3661-79).


Propensity score matching in cohort studies is frequently done 1:1, which, while allowing for selection of the best match for each member of the exposed cohort, may lead to severe depletion of the study population and the associated lower precision, especially when coupled with trimming. Increasing the matching ratio may increase precision but also negatively affect confounding control. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf. 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.


Use of EPS for stratification or weighing overcomes the precision-related limitation of matching-based methods, allowing use of a larger proportion of the study population in the analysis. Fine stratification approach is based on defining large number (50 or 100) number of EPS strata, as described in A Propensity-score-based Fine Stratification Approach for Confounding Adjustment When Exposure Is Infrequent (Epidemiology 2017;28(2):249-57).


High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiology 2009;20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol. 2011;173(12):1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf. 2012;20(8):849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf. 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS.


The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf. 2011;20(11):1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2014;23(8):802-11). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol. 2010;172(7):843-54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.


Performance of propensity score calibration-–a simulation study (Am J Epidemiol. 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.


Principles of variable selection for inclusion in EPS are described, for example, in Variable selection for propensity score models (Am J Epidemiol. 2006;163(12):1149-56) and in Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study (Pharmacoepidemiol Drug Saf. 2013;22(1):77-85).


Although in most situations, propensity score models, with the possible exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in drug safety research.

An important limitation of PS is that it is not directly amenable for case-control studies. A critical assessment of propensity scores is provided in Propensity scores: from naive enthusiasm to intuitive understanding (Stat Methods Med Res. 2012;21(3):273-93). Semiautomated and machine-learning based approaches to propensity score methods are currently being developed (Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects (Clin Epidemiol 2018;10:771-88). Instrumental variables


An instrumental variable (IV) is defined in Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf. 2010; 19(6):537-54) as a factor that is assumed to be related to treatment but is neither directly nor indirectly related to the study outcome. An IV should fulfil three assumptions: (1) it should affect treatment or be associated with treatment by sharing a common cause; (2) it should be a factor that is as good as randomly assigned so that it is unrelated to patient characteristics, and (3) it should be related to the outcome only through its association with treatment. This article also presents a practical guidance on IV analyses in pharmacoepidemiology. The article Instrumental variable methods for causal inference (Stat Med. 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.


IV analysis is an approach to address uncontrolled confounding in comparative studies. An introduction to instrumental variables for epidemiologists (Int J Epidemiol. 2000;29(4):722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumptions could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).


The complexity of the issues associated with confounding by indication, channeling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which was at odds with results from clinical trials. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2016;25 Suppl 1:132-41).


An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf. 2014;23(2):165-77) demonstrates that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.


Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol. 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options. Prior event rate ratios


Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) between the exposed and unexposed from periods before and after initiation of a drug exposure, as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf. 2008;17(7):671-685). For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.


Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2015(5);24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided. Handling time-dependent confounding in the analysis


In longitudinal studies, the value of covariates may change and be measured over time. These covariates are time-dependent confounders if they are affected by prior treatment and predict the future treatment decision and future outcome conditional on the past treatment exposure (see Comparison of Statistical Approaches Dealing with Time-dependent Confounding in Drug Effectiveness Studies, Stat Methods Med Res. 2016). Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation.


G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol. 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure.


The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11(5):550-60) introduces a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding. MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11(5):561-70).


Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol. 2003;158(7):687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta2-agonist use and the risk of coronary heart disease (Eur J Epidemiol. 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression. The trend-in-trend design


The Trend-in-trend Research Design for Causal Inference (Epidemiology 2017;28(4):529-36) presents a semi-ecological design, whereby trends in exposure and in outcome rates are compared in subsets of the population that have different rates of uptake for the drug in question. These subsets are identified through PS modelling. There is a formal framework for transforming the observed trends into an effect estimate. Simulation and empirical studies showed the design to be less statistically efficient than a cohort study, but more resistant to confounding. The trend-in-trend method may be useful in settings where there is a strong time trend in exposure, such as a newly approved drug.


6.3. Missing data


6.3.1. Impact of missing data


Missing data (or missing values) is defined as the data value that is not stored for a variable in the observation of interest. Missing data are a common problem in all datasets and can have a significant effect on the conclusions that can be drawn from the data for the following reasons: 1) the absence of data reduces statistical power, which refers to the probability that the test will reject the null hypothesis when it is false; 2) the lost data can cause bias in the estimation of parameters; 3) it can reduce the representativeness of the samples; 4) it may complicate the analysis of the study. Each of these elements can lead to invalid conclusions.


6.3.2. Patterns of missing data


There are different patterns of missing data:

  • Missing completely at random (MCAR): there are no systematic differences between the missing values and the observed values.

  • Missing at random (MAR): any systematic difference between the missing values and the observed values can be explained by differences in observed data.

  • Missing not at random (MNAR): even after the observed data are taken into account, systematic differences remain between the missing values and the observed values.

6.3.3. Handling missing data


Complete case analysis, thereby removing the records with missing data, is only valid in certain circumstances (i.e. if the missing data is MCAR). Therefore, it is advised to use statistical methods to impute missing data. These statistical methods will depend on the pattern of missing data. In general, it is desirable to show that conclusions drawn from the data are not sensitive to the particular pattern used to handle missing values. To investigate this, it may be helpful to repeat the analysis with a variety of statistical approaches.


A concise review of methods to handle missing data is provided in the section ‘Missing data’ of the Encyclopedia of Epidemiologic Methods (Gail MH, Benichou J, Editors. Wiley 2000) and in the book Statistical analysis with missing data (Little RJA, Rubin DB. 2nd ed.,Wiley 2002). The section ‘Handling of missing values’ in Modern Epidemiology, 3rd ed. (K. Rothman, S. Greenland, T. Lash. Lippincott Williams & Wilkins, 2008) is a summary of the state of the art, focused on practical issues for epidemiologists. Other useful references on handling missing data include the books Multiple Imputation for Nonresponse in Surveys (Rubin DB, Wiley, 2004) and Analysis of Incomplete Multivariate Data (Schafer JL, Chapman & Hall/CRC, 1997), and the articles Using the outcome for imputation of missing predictor values was preferred (J Clin Epi. 2006;59(10):1092-101), Recovery of information from multiple imputation: a simulation study (Emerg Themes Epidemiol. 2012;9(1):3) and Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data (Stat Med. 2014;33(21):3725-37).


Another method commonly used in epidemiology is to create a category of the variable, or an indicator, for the missing values. This practice can be invalid even if the data are missing completely at random and should be avoided (see Indicator and Stratification Methods for Missing Explanatory Variables in Multiple Linear Regression. J Am Stat Assoc. 1996;91(433):222-30).


6.3.4. Statistical software


A wide range of statistical software is available to impute missing data, mainly focusing on Multiple Imputation (MI) when missing data is assumed to be MAR, such as The MI Procedure of the SAS Institute. Multiple imputation of missing values (Stata J. 2004;4:227-41) and mice: Multivariate Imputation by Chained Equations in R (J Stat Soft. 2011;45(3)).


A good overview of available software packages is provided in Missing data: A statistical framework for practice (Biom J. 2021;63(5): 915-47).


6.4. Triangulation


Triangulation is not a separate methodological approach, but rather a framework, formally described in Triangulation in aetiological epidemiology (Int J Epidemiol. 2016;45(6):1866-86). Triangulation is defined as “the practice of obtaining more reliable answers to research questions through integrating results from several different approaches, where each approach has different key sources of potential bias that are unrelated to each other.” In some ways, the paper formalises approaches already used in many nonrandomised pharmacoepidemiologic studies, including control exposures and outcomes, sensitivity analyses, comparing results from different population and different study designs – all within the same study and while explicitly specifying the direction of bias in each approach. Triangulation was used (without using the explicit term) in Associations of maternal antidepressant use during the first trimester of pregnancy with preterm birth, small for gestational age, autism spectrum disorder, and attention-deficit/hyperactivity disorder in offspring (JAMA. 2017;317(15):1553-62), whereby, within the same study, the authors used negative controls (paternal exposure to antidepressants), and assess the association using different study design and study population (sibling design).



« Back to main table of contents