This chapter briefly describes the main types of study design. Specific aspects or applications of these designs are presented in Chapter 5.3. These designs are fully described in several textbooks cited in the Introduction, for example, Modern Epidemiology (K. Rothman, S. Greenland, T. Lash. 3rd Ed. Lippincott Williams & Wilkins, 2008).
In a cohort study, the investigator identifies a population at risk for the outcome of interest, defines two or more groups of people (referred to as study cohorts) who are free of disease and differ according to their extent of exposure, and follows them over time to observe the occurrence of the disease in the exposed and unexposed cohorts. A cohort study may also include a single cohort that is heterogeneous with respect to exposure history, and occurrence of disease is measured and compared between exposure groups within the cohort. The person-time of observation of each member of the cohorts is counted and the total person-time experience serves as the denominator for the calculation of the incidence rate of the outcome of interest. Cohorts are called fixed when individuals may not move from one exposure group to the other. They are called closed when no loss to follow-up is allowed. The population of a cohort may also be called dynamic (or open) if it can gain and lose members who contribute to the person-time experience for the duration of their presence in the cohort. The main advantages of a cohort study are the possibility to calculate directly interpretable incidence rates of an outcome and to investigate multiple outcomes for a given exposure. Disadvantages are the need for a large sample size and possibly a long study duration to study rare outcomes, although use of existing electronic health records databases allow to retrospectively recruit and analyse large cohorts (see Chapter 4).
Cohort studies are commonly used in pharmacoepidemiology to study the utilisation and effects of drugs. During the COVID-19 pandemic, it was the design of choice to compare the risk and severity of SARS-CoV-2 infection in persons using or not certain types of medicinal products. An example is Renin-angiotensin system blockers and susceptibility to COVID-19: an international, open science, cohort analysis (Lancet Digit Health 2021;3(2):e98-e114) where electronic health record databases were used to identify and follow patients aged 18 years or older with at least one prescription for RAS blockers, calcium channel blockers, thiazide or thiazide-like diuretics. Four outcomes were assessed: COVID-19 diagnosis, hospital admission with COVID-19, hospital admission with pneumonia, and hospital admission with pneumonia, acute respiratory distress syndrome, acute kidney injury, or sepsis.
In a case-control study, the investigator first identifies cases of the outcome of interest and their exposure status, but the denominators (person-time of observation) to calculate their incidence rates are not measured. A referent (traditionally called “control”) group is then sampled to estimate the relative distribution of the exposed and unexposed denominators in the source population from which the cases originate. Only the relative size of the incidence rates can therefore be calculated. Advantages of a case-control study is the possibility to initiate a study based on a set of cases already identified (e.g. in a hospital) and the possibility to study rare outcomes and their association with multiple exposures or risk factors. One of the main difficulties of case-control studies is the appropriate selection of controls independently of exposure or other relevant risk factors in order to ensure that the distribution of exposure categories among controls is a valid representation of the distribution in the source population. Another disadvantage is the difficulty to study rare exposures as a large sample of cases and controls would be needed to identify exposed groups large enough for the planned statistical analysis.
In order to increase the efficiency of exposure assessment in case-control studies, an alternative approach is a design in which the source population is a cohort. The nested case-control design includes all cases occurring in the cohort and a pre-specified number of controls randomly chosen from the population at risk at each time a case (or other relevant event) occurs. A case-cohort study includes all cases and a randomly selected sub-cohort from the population at risk. Advantages of such designs is to allow the conduct of a set of case-control studies from a single cohort and use efficiently electronic health care records databases where data on exposures and outcomes are already available.
The study Impact of vaccination on household transmission of SARS-COV-2 in England (Public Health England, 2021) is a nested case-control study where the cohort was defined by occurrence of a laboratory-confirmed COVID-19 case occurring in a household between 4 January 2021 to 28 February 2021. A “case” was defined as a secondary case occurring in the same household as a COVID-19 case and a “control” was identified as a person without infection. Exposure was defined by the presence of a vaccinated COVID-19 case vs. an unvaccinated COVID-19 case in the same household with the restriction that the vaccinated COVID-19 case had to be vaccinated 21 days prior to being diagnosed. The statistical analysis calculated the odds ratios and 95% confidence intervals for household members becoming ‘cases’ if the COVID-19 case was vaccinated with 21 days or more before testing positive, vs. household members where the COVID-19 case was not vaccinated.
In A plea to stop using the case-control design in retrospective database studies (Stat Med. 2019;38(22):4199-208), the authors argue, based on examples, that the case-control design may lead to bias due to residual confounding that stems from unadjusted differences between exposure groups or from accidental inclusion of intermediary variables in propensity scores or disease-risk scores. It is therefore recommended to use negative control exposures (see Chapter 5.4.4) to evaluate presence of confounding, or alternative designs such as a cohort or a self-controlled design. This is illustrated in the nested case-control study First-dose ChAdOx1 and BNT162b2 COVID-19 vaccines and thrombocytopenic, thromboembolic and hemorrhagic events in Scotland (Nat Med. 2021), where the authors highlight the possibility of residual confounding by indication and performed a post-hoc self-controlled case series (SCCS, see below) analysis to adjust for time-invariant confounders.
Although case-only (self-controlled) designs are not considered as traditional study designs, they are increasingly used, and a large amount of methodological research has been published over the last decade. They are therefore presented separately.
Case-only designs are designs in which cases are the only subjects. This design reduces confounding by using the exposure and outcome history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, such as sex, socio-economic factors, genetic factors or chronic diseases. The article Control yourself: ISPE-endorsed guidance in the application of self-controlled study designs in pharmacoepidemiology (Pharmacoepidemiol Drug Saf. 2021;30(6):671–84) proposes a common terminology to facilitate critical thinking in the design, analysis and review of studies called by the authors Self-controlled Crossover Observational PharmacoEpidemiologic (SCOPE) studies. These are split into outcome-anchored (case-crossover, case-time-control and case-case-time control), and exposure-anchored (self-controlled case series) that are suitable for slightly different research questions. The article concludes that these designs are best suited to studying transient exposures in relation to abrupt outcomes.
A simple form of a self-controlled design is the sequence symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84).
The case-crossover (CCO) design compares the risk of exposure in a time period prior to an outcome with that in an earlier reference time-period, or set of time periods, to examine the effect of transient exposures on acute events (see The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events, Am J Epidemiol 1991;133(2):144-53). The case-time-control designs are a modification of the case-crossover design which use exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (The case-time-control design, Epidemiology 1995;6(3):248-53). However, if not well matched, the case-time-control group may reintroduce selection bias (see Confounding and exposure trends in case-crossover and case-time-control designs (Epidemiology 1996;7(3):231-9). Methods have been suggested to overcome the exposure-trend bias while controlling for time-invariant confounders (see Future cases as present controls to adjust for exposure trend bias in case-only studies, Epidemiology 2011;22(4):568-74). Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol. 2016;184(10):761-9) demonstrates that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group. Evaluation of the Case-Crossover (CCO) Study Design for Adverse Drug Event Detection (Drug Saf. 2017;40(9):789-98) showed that the CCO design adequately performs in studies of acute outcomes with abrupt onsets and exposures characterised as transient with immediate effects.
The self-controlled case-series design (SCCS) and the self-controlled risk interval (SCRI) method were initially developed more specifically for vaccine studies and include only exposed cases. The observation period for each exposure for each case is divided into risk period(s) (e.g. number of days immediately following each exposure) and a control period (observed time outside this risk period). A good overview is provided in Tutorial in biostatistics: the self-controlled case series method (Stat Med. 2006;25(10):1768-97) and Investigating the assumptions of the self-controlled case series method (Stat Med. 2018;37(4):643-58). These designs are further discussed in Chapter 5.4.3, and their application to vaccine safety studies is presented in Chapter 14.2.1.
Cross-sectional studies are descriptive studies that seek to collect information on a study population at a specified time point. Cross-Sectional Studies: Strengths, Weaknesses, and Recommendations (Chest 2020;158(1S):S65-S71) provides further background and recommendations for the conduct of cross-sectional studies as well as use cases.
The data collected at the time point may include both exposure and outcome data. In studies looking at the association between drug use and a clinical outcome, use of prevalent drug users (i.e. patients already treated for some time before study follow-up begins) can introduce two types of bias. Firstly, prevalent drug users are “survivors” of the early period of treatment, which can introduce substantial (selection) bias if the risk varies with time. Secondly, covariates relevant for drug use at the time of the entry (e.g. disease severity) may be affected by previous drug utilisation or patients may differ regarding health-related behaviours (healthy user effect). No firm inference on a causal relationship can therefore be made from the results.
The study The incidence of cerebral venous thrombosis: a cross-sectional study (Stroke 2012;43(12):3375-7) was used to provide an estimate of the background incidence of cerebral sinus venous thrombosis (CSVT) in the context of the safety assessment of COVID-19 vaccines. Patients were identified from all 19 hospitals from two Dutch provinces using specific code lists. Review of medical records and case ascertainment were conducted to include only confirmed cases. Incidence was calculated using population figures from census data as the denominator.
Ecological analyses are not hypothesis testing but hypothesis generating studies. Fundamentals of the ecological design are described in Ecologic studies in epidemiology: concepts, principles, and methods (Annu Rev Public Health 1995;16:61-81) and a ‘tool box’ is presented in Study design VI - Ecological studies (Evid Based Dent. 2006;7(4):108).
As illustrated in Control without separate controls: evaluation of vaccine safety using case-only methods (Vaccine 2004;22(15-16):2064-70), ecological analyses assume that a strong correlation between the trend in an indicator of an exposure (vaccine coverage in this example) and the trend in incidence of a disease (trends calculated over time or across geographical regions) is consistent with a causal relationship. Such comparisons at the population level may only generate hypotheses as they do not allow controlling for time-related confounding variables, such as age and seasonal factors. Moreover, they do not establish that the effect occurred in the exposed individuals.
Case-population studies and interrupted time series analyses are forms of ecological studies and presented in Chapter 5.3. The case-coverage (ecological) design is mainly used for vaccine monitoring and is presented in Chapter 18.104.22.168.