There is a considerable body of literature explaining statistical methods for observational studies but very little addressing the statistical analysis plan (SAP). A clear guide to general principles and the need for a SAP is provided in Design of Observational Studies (P.R. Rosenbaum, Springer Series in Statistics, 2010. Chapter 18), which also gives useful advice on how to test complex hypotheses in a way that minimises the chances of drawing incorrect conclusions.
A study is generally designed with the objective of addressing a set of research questions. A main component of a study is an initial raw dataset with a set of numerical and categorical observations that do not usually provide a direct answer to the questions that the study is designed to address. The SAP details the mathematical calculations that will be performed on the observed data in the study and the patterns of results that will be interpreted as supporting answers to the questions. An important part of the SAP should explain how issues with the data will be handled in such calculations, for example, missing or incomplete data.
Planning analyses for randomised clinical trials is covered in a number of publications. These often give checklists of the different components of the SAP and much of this applies equally to non-randomised designs. Good references in this respect are the ICH E9 Statistical Principles for Clinical Trials and its addendum on estimands and sensitivity analysis in clinical trials (E9-R1), and the Guide to the statistical analysis plan (Paediatr Anaesth. 2019;29(3):237-242).
The following objectives of a SAP apply to most studies, including observational studies.
Pre-specification of statistical and epidemiological analyses can be challenging for data that are not collected specifically to answer the study questions. This is often the case in observational studies, where secondary data are used. However, thoughtful specification of the way missing values will be handled or the use of a small part of the data as a pilot set to guide analysis can be useful techniques to overcome such problems. A feature common to most studies is that some analyses that are not pre-specified will be performed in response to observations in the data to help interpretation of results. It is important to distinguish between such data-driven analyses and the pre-specified findings. Post-hoc modifications to the analysis strategy should be noted and explained. The SAP provides a confirmation of this process.
Specific to observational studies, strong emphasis will be given to measures applied to control and to quantify levels of bias. Factors that may bias the results of observational studies are described in Chapter 6.1. Avoiding bias in observational studies: part 8 in a series of articles on evaluation of scientific publications (Dtsch Arztebl Int. 2009;106(41):664-8) explains how these main methodological problems can be avoided by careful planning. Part of the SAP will be devoted to converting scientific understanding of the causal relationship between the exposures and outcomes that are the primary focus of the study and other variables that are available in the dataset into a credible mathematical model. It is also advisable to consider appropriate negative controls in the analysis – (exposure, outcome) pairs that are strongly believed not to be causally related for which a similar model is considered reasonable – as these may indicate bias, or unknown or unmeasured confounding (see Chapter 5.3.4).