Selection bias means the selective inclusion into the study of subjects who are not representative of the exposure or outcome pattern in the source population. Of note that selection bias needs to produce distortion in exposure-outcome relation. Lack of representativeness of exposure or outcome pattern alone is not sufficient to cause selection bias. Examples of common selection bias are referral bias, self-selection bias and prevalence bias.
Prevalence bias may occur when prevalent drug users are included in an observational study, i.e. patients already taking a therapy for some time before study follow-up began. This can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if the risk varies with time, as seen in safety studies with unwanted exclusion from a safety assessment of persons discontinuing treatments following early adverse reactions (‘depletion of susceptibles’). An illustrative example is the comparison between users of third and older generations of oral contraceptives regarding the risk of venous thrombosis where the association for the third generation was initially overestimated due to the heathy-users bias in persons taking older contraceptives (see The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update 1999;5(6):707-20). Secondly, covariates for drug use at study entry are often influenced by the previous intake of the drug.
The article Collider bias undermines our understanding of COVID-19 disease risk and severity (Nat Commun. 2020;11(1):5749) describes a selection bias where a variable (a collider) is influenced by two other variables, for example when an exposure (being a healthcare worker) and an outcome (severity of COVID-19 infection) both affect the variable determining the likelihood of being sampled (presence of PCR testing or hospitalisation). A bias would arise when the analysis includes only those people who have experienced an event such as hospitalisation with COVID-19, been tested for active infection or who have volunteered their participation. Among hospitalised patients, the relationship between any exposure that relate to hospitalisation and the severity of infection would be distorted compared to the general population. The article proposes methods for detecting and minimising the effects of collider bias. Vaccine side-effects and SARS-CoV-2 infection after vaccination in users of the COVID Symptom Study app in the UK: a prospective observational study (Lancet Infect Dis. 2021;S1473) discusses that collider bias would occur in the study if both vaccination status and COVID-19 positivity influenced the probability of participation in the study. However, it is believed that collider bias was unlikely to underlie the reduction in infections following vaccination seen in the data given that strong reductions in COVID-19 hospitalisations after vaccination were observed in other nationwide studies.
Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;kwab028) illustrates selection bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection and how to address them at the analysis and design stages.
Mitigating selection bias at the analysis stage
Once they have occurred, selection biases cannot be removed at the analysis stage if the factors responsible for the selection are not known or not measured. In some circumstances, it may be possible to restrict the study population by including only groups where the selection did not operate. For example, a prevalence bias may be removed by restricting the analysis to incident drug users, i.e. patients enter the study cohort only at the start of the first course of the treatment of interest (or of different treatment groups) during the study period. Consequences may include reduced precision of estimates due to lower sample size and likely reduction in the number of patients with long-term exposure. In circumstances where the factors influencing the selection are known and have been accurately measured, they can be treated as confounding factors and adjusted for at the analysis stage.
Mitigating selection bias at the design stage
The impact of selection biases should therefore be best avoided or minimised with proper consideration at study design. The new user (incident user) design helps mitigate selection bias by alleviate healthy user bias for preventive treatments in some circumstances (see Healthy User and Related Biases in Observational Studies of Preventive Interventions: A Primer for Physicians. J Gen Intern Med 2011;26(5):546-50). The article Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol. 2003;158 (9):915–20) defines new user designs in cohort and case-control settings. The articles The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep. 2015;2(4):221-28) and New-user designs with conditional propensity scores: a unified complement to the traditional active comparator new-user approach (Pharmacoepidemiol Drug Saf. 2017;26(4):469-7) extend the discussion to studies with active comparators. One should be aware of the difference between a new user (which requires absence of prior use of a given drug/drug class during a prespecified washout period) and a treatment-naïve user (which requires absence of prior treatment for a given indication). A treatment-naïve status may not be ascertainable in left-truncated data.
The active comparator new user design (see Chapter 5.4.5) would ideally compare two treatments that are marketed contemporaneously. However, a more common situation is where a recently marketed drug is compared with an older established alternative. For such situations, the article Prevalent new-user cohort designs for comparative drug effect studies by time-conditional propensity scores (Pharmacoepidemiol Drug Saf. 2017;26(4):459-68) introduces a cohort design allowing identification of matched subjects using the comparator drug at the same point in the course of disease as the (newly marketed) drug of interest. The design utilises time-based and prescription-based exposure sets to compute time-dependent propensity scores of initiating the new drug.
Observational studies of treatment effectiveness: worthwhile or worthless? (Clin Epidemiol. 2018;11:35-42) discuss how researchers can mitigate the risk of bias in the cohort design and present a case of the comparative effectiveness of two antidiabetic treatments using data collected during routine clinical practice.
The use of case-only designs can also reduce selection bias if the statistical assumptions of the method are fulfilled (see Chapters 5.2.3 and 5.4.3).
Information bias (misclassification) arises when incorrect information about either exposure or outcome or any covariates is collected in the study or if variables are incorrectly categorized. Different factors may cause information bias. Chapter 5.1. describes errors in definition, measurement and classification of variables and how to address them. Errors may also occur in the study design and method for data collection. Examples are the recall bias occurring in case-controls studies where cases and controls can have different recall of their past exposures, as well as the protopathic bias and surveillance or detection bias which are described below.
Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias thus reflects a reversal of cause and effect (see Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65(22):2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies. It may be handled by including a time-lag, (i.e. by disregarding all exposure during a specified period of time before the index date).
Protopathic bias has also been described as a selection bias and it should not be confused with confounding by indication (see Confounding by Indication: An Example of Variation in the Use of Epidemiologic Terminology, Am J Epidemiol. 1999;149(11):981-3).
Surveillance bias (or detection bias)
Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or of an associated symptom. For example, post-menopausal exposure to estrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between estrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).
This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. These issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA 2011;305(23):2462-3).
Immortal time bias
Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur. (K. Rothman, S. Greenland, T. Lash. Modern Epidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008).
Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf. 2007;16(3):241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this bias.
Immortal Time Bias in Pharmacoepidemiology (Am J Epidemiol 2008;167(4):492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions and illustrates it with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time. It is asserted that immortal time bias arises by conditioning on future exposure and that it can be avoided by analysing the data as if the exposures and outcomes were included as they developed, without ever looking into the future. Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;kwab028) illustrates immortal time bias present in several studies evaluating the effects of drugs on SARS-CoV-2 infection.
Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162(10):1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects are excluded from the analysis and the study allows effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ. 2010; 340:b5087) describes how immortal time in observational studies can bias the results in favor of the treatment group and how they can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias in pharmacoepidemiology’ (Am J Epidemiol 2009;170(5):667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.
Other forms of time-related bias
In many database studies, drugs administered during hospitalisations are unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure, especially when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol. 2008;168(3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.
In case-control studies assessing chronic diseases with multiple hospitalisations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias, as shown in Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol. 2008;168(3):329-35).
In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g. diabetes), which can induce confounding of the association with an outcome (e.g. cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).
Time-related biases in pharmacoepidemiology (Drug Saf. 2020;29(9):1101-1110) further discusses several time-related biases and illustrates their impact on the effects of different COPD treatments on lung cancer, acute myocardial infarction and mortality outcomes, in studies using electronic healthcare databases. Protopathic, latency, immortal time, time-window, depletion of susceptibles, and immeasurable time biases were shown to significantly impact the effects of the study drugs on the outcomes.