Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


14.3. Design, implementation and analysis of pharmacogenomic studies

14.3.1. Introduction


Individual differences in the response to medicines encompasses variation in both efficacy and safety, including the risk of severe adverse drug reactions. Clinical factors influencing response include disease severity, age, gender, and concomitant drug use. However, natural genetic variation that influences the expression or activity of proteins involved in drug disposition (absorption, metabolism, distribution, and excretion) as well as the protein targets of drug action (such as enzymes and receptors) may be an important additional source of inter-individual variability in both the beneficial and adverse effects of drugs (see Pharmacogenomics: translating functional genomics into rational therapeutics. Science 1999;286(5439):487-91).


Pharmacogenomics is defined as the study of genetic variation as a determinant of drug response. Drug response may vary as a result of differences in the DNA sequence present in the germline or, in the case of cancer treatments, due to somatic variation in the DNA arising in cancer cells (see The Roles of Common Variation and Somatic Mutation in Cancer Pharmacogenomics, Oncol Ther. 2019 Jun;7(1):1-32) or, in the case of treatment or prevention of infectious diseases, due to variation in the pathogen's genome (see Pharmacogenomics and infectious diseases: impact on drug response and applications to disease management, Am J Health Syst Pharm. 2002 1;59(17):1626-31). When incorporated, the study of genetic variation underlying drug response can complement information on clinical factors and disease sub-phenotypes to optimise the prediction of treatment response and reduce the risk of adverse reactions. The identification of variation in genes that modify the response to drugs provides an opportunity to optimise safety and effectiveness of the currently available drugs and to develop new drugs for paediatric and adult populations (see Drug discovery: a historical perspective, Science 2000;287(5460):1960-4).


It is important to note that pharmacogenomics is one of several approaches available to identify useful biomarkers of drug effects. Other approaches include, but are not limited to, epigenomics (the study of gene expression changes not attributable to changes in the DNA sequence), transcriptomics, proteomics (protein function and levels, see Precision medicine: from pharmacogenomics to pharmacoproteomics, Clin Proteom .2016; 13:25), and metabolomics.


14.3.2. Identification of genetic variants influencing drug response




Identification of genetic variation associated with important drug or therapy-related outcomes can be carried out by three main technologies. The choice of which may be dictated by whether the aim is research and discovery or clinical application, and whether the genetic variants being sought occur at high or low frequency in the population or patient group being evaluated. The strategy to identify genetic variants will depend on the aim and design of the pharmacogenetic study or the clinical application (see Methodological and statistical issues in pharmacogenomics, J Pharm Pharmacol. 2010;62(2):161-6). For illustration, to assess clinical applications, technologies might be used to identify genetic variants where there is already prior knowledge about the gene or the variant (candidate gene studies). These studies require prior information about the likelihood of the polymorphism, gene, or gene-product interacting with a drug or drug pathway, and thus, resources can be directed to several important genetic polymorphisms with a higher a priori chance of relevant drug-gene interactions. Moving towards individualized medicine with pharmacogenomics (Nature 2004;429(6990):464-8) explains that lack or incompleteness of information on genes from previous studies may result in the failure in identifying every important genetic determinant in the genome.


In contrast, genome-wide scan approaches are discovery orientated and use technologies that identify genetic variants across the genome without previous information or gene/variant hypothesis (hypothesis-generating or hypothesis-agnostic approach). Genome-wide approaches are widely used to discover the genetic basis of common complex diseases where multiple genetic variations contribute to disease risk. The same study design is applicable to identification of genetic variants that influence treatment response. However, common variants in the genome, if functional, have generally small effect sizes, and therefore large sample sizes should be considered, for example by pooling different studies as done by the CHARGE Consortium with its focus on cardiovascular diseases (see The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium as a model of collaborative science, Epidemiology 2013;24(3):346-8). By comparing the frequency of genetic variants between drug responders and non-responders, or those with or without drug toxicity, genome-wide approaches can identify important genetic determinants. They may detect variants in genes, which were previously not considered as candidate genes, or even variants outside of the genes. However, because of the concept of linkage disequilibrium, whereby certain genetic determinants tend to be co-inherited together, it is possible that the genetic associations identified through a genome-wide approach may not be truly biologically functional polymorphisms, but instead may simply be a linkage-related marker of another genetic determinant that is the true biologically relevant genetic determinant. Thus, this approach is considered discovery in nature. Furthermore, failure to cover all relevant genetic risk factors can still be a problem, though less than with the candidate gene approach. It is therefore essential to conduct replication studies in independent cohorts and validation studies (in vivo and in vitro) to ascertain the generalisability of findings to populations of individuals, to characterise the mechanistic basis of the effect of these genes on drug action, and to identify true biologic genetic determinants. Importantly, allele frequencies differ across populations, and these differences should be accounted for to reduce biases when designing and analysing pharmacogenetic studies, and to ensure equity when implementing pharmacogenomics in the healthcare setting (see Preventing the exacerbation of health disparities by iatrogenic pharmacogenomic applications: lessons from warfarin, Pharmacogenomics 2018 19(11):875-81).


More recently, pharmacogenomics has also been undertaken in large national biobanks which link genetic data to healthcare data for a cohort of hundreds of thousands of participants, for examples the UK Biobank (see UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age, PLoS Med. 2015;12(3):e1001779, and the Estonian Biobank (see Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu, Int J Epidemiol. 2015;44(4):1137-47). Translating genotype data of 44,000 biobank participants into clinical pharmacogenetic recommendations: challenges and solutions and other studies (Genet Med. 2019;21(6):1345-54) has shown that these large-scale resources represent unique opportunities to discover novel and rare variants.


Technologies used for detection of genetic variants


The main technologies are:

  • Genotyping and array-based technologies which are the most feasible and cost-effective approach for most large-scale clinical utility studies and for clinical implementation, either through commercial or customised arrays. They can identify hundreds of thousands of genetic variants within one or several genes, including a common form of variations known as single nucleotide polymorphisms (SNPs). The identification of genetic determinants is limited to the variants included in the array, and thus, it cannot be used to discover novel variants. Generally, they are chosen on the grounds of biological plausibility, which may have been proven before in previous studies, or of knowledge of functional genes known to be involved in pharmacokinetic and pharmacodynamics pathways or related to the disease or intermediate phenotype.

  • Sanger sequencing represents the gold standard used in clinical settings for confirming genetic variants since it was first commercialised in 1986. More recently, Sanger sequencing has been replaced by other sequencing methods to increase the speed and reduce the cost of DNA sequencing, especially for automated analysis involving large numbers of samples.

  • Next generation sequencing (NGS) is a high-throughput sequencing technology that identifies genetic variants across the genome (whole genome sequencing; WGS) or the exome (whole exome sequencing; WES) without requiring prior knowledge on genetic biomarkers. These techniques may prove valuable in early research settings for discovery of novel or rare variants, and for the detection of structural variants and copy number variation which are common in pharmacogenes such as CYP2D6 (see A Review of the Important Role of CYP2D6 in Pharmacogenomics. Genes (Basel) 2020;11(11):1295). As use of clinical WGS testing increases, the return of secondary pharmacogenomic findings will benefit from greater understanding of rare and novel variants.

Variant curation and annotation


Lastly, the identification of genetic variants requires careful curation and annotation to ensure that their description and allelic designation is standardised. Common pharmacogenomic variants and haplotypes (combinations of sequence variants in the same individual) are catalogued by the Pharmacogene Variation Consortium (PharmVar) using a ‘star allele’ nomenclature. The use of this nomenclature is historic and in human disease genetics, the reference sequence identifier (rs-id) is more commonly used as to assign a genetic variant unambiguously. Although the star allele nomenclature remains the most widely used classification in pharmacogenomic research it is recognised to have several limitations. Pharmacogenomic haplotypes and star alleles can lack accurate definition and validation, and there may be limited annotation of phenotypic effects. In addition, current classifications also exclude many rare variants which are increasingly recognised as having an important effect, as described in Pharmacogenetics at Scale: An Analysis of the UK Biobank (Clin Pharmacol Ther. 2021;109(6):1528-37). Some authors have called for an effort to standardise annotation sequence variants (see The Star-Allele Nomenclature: Retooling for Translational Genomics. Clin Pharmacol Ther. 2007;82(3):244–8).


14.3.3. Study designs


Several options are available for the design of pharmacogenetic studies to ascertain the effect and importantly the clinical relevance and utility of obtaining pharmacogenetic information to guide prescribing decisions regarding the choice and dose of agent for a particular condition (see Prognosis research strategy (PROGRESS) 4: Stratified medicine research, BMJ. 2013;346:e5793).


Firstly, RCTs, both pre- and post-authorisation, provide the opportunity to address several pharmacogenetic questions. Pharmacogenetics in randomized controlled trials: considerations for trial design (Pharmacogenomics 2011;12(10):1485-92) describes three different trial designs differing in the timing of randomization and genotyping, and Promises and challenges of pharmacogenetics: an overview of study design, methodological and statistical issues (JRSM Cardiovasc Dis. 2012;1(1)) discusses outstanding methodological and statistical issues that may lead to heterogeneity among reported pharmacogenetic studies and how they may be addressed. Pharmacogenetic trials can be designed (or post hoc analysed) with the intention to study whether a subgroup of patients, defined by certain genetic characteristics, respond differently to the treatment under study. Alternatively, a trial can verify whether genotype-guided treatment is beneficial over standard care. Obvious limitations with regard to the assessment of rare adverse drug events or low prevalence genetic variants are the large sample size required and its related high costs. In order to make a trial as efficient as possible in terms of time, money and/or sample size, it is possible to opt for an adaptive trial design, which allows prospectively planned modifications in design after patients have been enrolled in the study. Such a design uses accumulating data to decide how to modify aspects of the study during its progress, without undermining the validity and integrity of the trial. An additional benefit is that the expected number of patients exposed to an inferior/harmful treatment can be reduced (see Potential of adaptive clinical trial designs in pharmacogenetic research, Pharmacogenomics 2012;13(5):571-8).


Observational studies are an alternative and can be family-based (using twins or siblings) or population-based (using unrelated individuals). The main advantage of family-based studies is the avoidance of bias due to population stratification. A clear practical disadvantage for pharmacogenetic studies is the requirement to study families where patients have been treated with the same drugs (see Methodological quality of pharmacogenetic studies: issues of concern, Stat Med. 2008;27(30):6547-69).


Population-based studies may be designed to assess drug-gene interactions as cohort (including exposure-only), case-cohort and case-control studies (including case-only, as described in Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol. 1996;144(3):207-13). Sound pharmacoepidemiological principles as described in the current Guide also apply to observational pharmacogenetic studies. A specific type of confounding due to population stratification needs to be considered in pharmacogenetic studies, and, if present, needs to be dealt with. Its presence may be obvious where the study population includes more than one immediately recognisable ethnic group; however, in other studies stratification may be more subtle. Population stratification can be detected by the Pritchard and Rosenberg’s method, which involves genotyping additional SNPs in other areas of the genome and testing for association between them and outcome (see Association mapping in structured populations, Am J Hum Genet. 2000;67(1):170-81). In genome-wide association studies, the data contained within the many SNPs typed can be used to assess population stratification without the need to undertake any further genotyping. Several methods have been suggested to control for population stratification such as genomic control, structure association and EIGENSTRAT. These methods are discussed in Methodological quality of pharmacogenetic studies: issues of concern (Stat Med. 2008;27(30):6547-69), Softwares and methods for estimating genetic ancestry in human populations (Hum Genomics 2013;7(1):1) and Population Stratification in Genetic Association Studies (Curr Protoc Hum Genet. 2017;95:1.22.1–1.22.23).


The main advantage of exposure-only and case-only designs is the smaller sample size that is required, at the cost of not being able to study the main effects of drug exposure (case-only) or genetic variant (exposure-only) on the outcome. Furthermore, interaction can be assessed only on a multiplicative scale, whereas from a public health perspective, additive interactions are very relevant. Up till now GWAS with gene*interactions have not been very rewarding because of the required huge power. However, this is likely to improve as genetic data is linked to longitudinal clinical data in large biobanks, as described in Drug Response Pharmacogenetics for 200,000 UK Biobank Participants (Biocomputing 2021;184-95). An important condition that has to be fulfilled for case-only studies is that the exposure is independent of the genetic variant, e.g. prescribers are not aware of the genotype of a patient and do not take this into account, directly or indirectly (by observing clinical characteristics associated with the genetic variant). In the exposure-only design, the genetic variant should not be associated with the outcome, for example variants of genes coding for cytochrome p-450 enzymes. When these conditions are fulfilled and the main interest is in the drug-gene interaction, these designs may be an efficient option. In practice, case-control and case-only studies usually result in the same interaction effect as empirically assessed in Bias in the case-only design applied to studies of gene-environment and gene-gene interaction: a systematic review and meta-analysis (Int J Epidemiol. 2011;40(5):1329-41). The assumption of independence of genetic and exposure factors can be verified among controls before proceeding to the case-only analysis. Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias (Int J Epidemiol. 2004;33(5):1014-24) conducted sensitivity analyses to describe the circumstances in which controls can be used as proxy for the source population when evaluating gene-environment independence. The gene-environment association in controls will be a reasonably accurate reflection of that in the source population if baseline risk of disease is small (<1%) and the interaction and independent effects are moderate (i.e. risk ratio<2), or if the disease risk is low (e.g. <5%) in all strata of genotype and exposure. Furthermore, non-independence of gene-environment can be adjusted in multivariable models if non-independence can be measured in controls. Further methodological considerations and assumptions of study designs in pharmacogenomics research are discussed in A critical appraisal of pharmacogenetic inference (Clin Genet. 2018;93(3): 498-507).


Lastly, variation in prevalence and effect of pharmacogenetic variants across different ethnicities is an important consideration for study design and ultimately clinical utility, cost-effectiveness and implementation of testing. International research collaborations, as demonstrated in several studies (see HLA-B*5701 Screening for Hypersensitivity to Abacavir, N Engl J Med. 2008;358(6):568-79; and Effect of Genotype-Guided Oral P2Y12 Inhibitor Selection vs Conventional Clopidogrel Therapy on Ischemic Outcomes After Percutaneous Coronary Intervention: The TAILOR-PCI Randomized Clinical Trial. JAMA. 2020; 25;324(8):761-71), encourage greater representation of different populations and ensure broader applicability of pharmacogenomic study results. Diverse ethnic representation in study recruitment is important to detect the range of variant alleles of importance across different ethnic groups and reduce inequity in the clinical impact of pharmacogenomic testing once implemented.


14.3.4. Data collection


The same principles and approaches to data collection as for other pharmacoepidemiological studies can be followed (see Chapter 4 of this Guide on Approaches to Data Collection). An efficient approach to data collection for pharmacogenetic studies is to combine secondary use of electronic health records with primary data collection (e.g. biological samples to extract DNA).


Examples are given in SLCO1B1 genetic variant associated with statin-induced myopathy: a proof-of-concept study using the clinical practice research datalink (Clin Pharmacol Ther. 2013;94(6):695-701), Diuretic therapy, the alpha-adducin gene variant, and the risk of myocardial infarction or stroke in persons with treated hypertension (JAMA. 2002;287(13):1680-9) and Interaction between the Gly460Trp alpha-adducin gene variant and diuretics on the risk of myocardial infarction (J Hypertens. 2009;27(1):61-8). Another approach to enrich electronic health records with biological samples is record linkage to biobanks as illustrated in Genetic variation in the renin-angiotensin system modifies the beneficial effects of ACE inhibitors on the risk of diabetes mellitus among hypertensives (Hum Hypertens. 2008;22(11):774-80). A third approach is to use active surveillance methods to fully characterise drug effects such that a rigorous phenotype can be developed prior to genetic analysis. This approach was followed in Adverse drug reaction active surveillance: developing a national network in Canada's children's hospitals (Pharmacoepidemiol Drug Saf. 2009;18(8):713-21) and EUDRAGENE: European collaboration to establish a case-control DNA collection for studying the genetic basis of adverse drug reactions (Pharmacogenomics 2006;7(4):633-8).


14.3.5. Data analysis


The focus of data analysis should be on the measure of effect modification (see Chapter 4.2.4 of this Guide on Effect Modification). Attention should be given to whether the mode of inheritance (e.g. dominant, recessive or additive) is defined a priori based on prior knowledge from functional studies. However, investigators are usually naïve regarding the underlying mode of inheritance. A solution might be to undertake several analyses, each under a different assumption, though the approach to analysing data raises the problem of multiple testing (see Methodological quality of pharmacogenetic studies: issues of concern, Stat Med. 2008;27(30):6547-69). The problem of multiple testing and the increased risk of type I error is in general a problem in pharmacogenetic studies evaluating multiple SNPs, multiple exposures and multiple interactions. The most common approach to correct for multiple testing is to use the Bonferroni correction. This correction may be considered too conservative and runs the risk of producing many pharmacogenetic studies with a null result. Other approaches to adjust for multiple testing include permutation testing and false discovery rate (FDR) control, which are less conservative. The FDR, described in Statistical significance for genome-wide studies (Proc Natl Acad Sci. USA 2003;100(16):9440-5), estimates the expected proportion of false-positives among associations that are declared significant, which is expressed as a q-value.


Alternative innovative methods are under development and may be used in the future, such as Mendelian Randomization (see Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality, Annu Rev Genomics Hum Genet. 2015;16:327-50), systems biology, Bayesian approaches, or data mining (see Methodological and statistical issues in pharmacogenomics, J Pharm Pharmacol. 2010;62(2):161-6).

Important complementary approaches include the conduct of individual patient data meta-analyses and/or replication studies to avoid the risk of false-positive findings.


An important step in analysis of genome-wide association studies data that needs to be considered is the conduct of rigorous quality control procedures before conducting the final association analyses. This becomes particularly important when phenotypic data were originally collected for a different purpose (“secondary use of data”). Relevant guidelines include Guideline for data analysis of genomewide association studies (Cancer Genomics Proteomics 2007;4(1):27-34) and Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis (Curr Pharmacogenomics Person Med. 2011;9(1):41-66).


14.3.6. Reporting


The guideline STrengthening the Reporting Of Pharmacogenetic Studies: Development of the STROPS guideline (PLOS Medicine 2020;17(9):e1003344) should be followed for reporting findings of pharmacogenetic studies. Essential Characteristics of Pharmacogenomics Study Publications (Clin Pharmacol Ther. 2019;105(1):86-91) also provides recommendations to ensure that all the relevant information is reported in pharmacogenetic studies. As pharmacogenetic information is increasingly found in drug labels, as described in Pharmacogenomic information in drug labels: European Medicines Agency perspective (Pharmacogenomics J. 2015;15(3):201–10), it is essential to warrant consistency across the reporting of pharmacogenetic studies. Additional efforts by regulatory agencies, international organisations or boards to standardise the reporting and utilisation of pharmacogenetic studies will be discussed in the next section.


14.3.7. Clinical Implementation and Resources


An important step towards the implementation of the use of genotype information to guide pharmacotherapy is the development of clinical practice guidelines. An important pharmacogenomics knowledge resource is PharmGKB which curates and disseminates clinical information about the impact of human genetic variation on drug responses, including genotype-phenotype relationships, potentially clinically actionable gene-drug associations, clinical guidelines, and drug labels. The development and publication of clinical practice guidelines for pharmacogenomics has been driven by international initiatives including the Clinical Pharmacogenetics Implementation Consortium, the European Medicines Agency Pharmacogenomics Working Party, the Dutch Pharmacogenetics Working Group (see Pharmacogenetics: From Bench to Byte— An Update of Guidelines, Clin Pharmacol Ther. 2011;89(5):662–73; Use of Pharmacogenetic Drugs by the Dutch Population, Front Genet. 2019;10:567) and the Canadian Pharmacogenomics Network for Drug Safety). Evidence of clinical utility and cost-effectiveness of pharmacogenomic tests is important to support the translation of clinical guidelines into policies for implementation across health services, such as pharmacogenomic testing for DPYD polymorphisms with fluoropyrimidine therapies (see EMA recommendations on DPD testing prior to treatment with fluorouracil, capecitabine, tegafur and flucytosine).


The clinical implementation of pharmacogenomic testing requires consideration of complex clinical pathways and the multifactorial nature of drug response. Translational research and clinical utility studies can identify issues arising from the translation of pharmacokinetic or retrospective studies into real-world implementation of pharmacogenomic testing (see Carbamazepine-induced toxic effects and HLA-B*1502 screening in Taiwan, N Engl J Med. 2011;364(12):1126-33). Careful consideration is required in the interpretation of gene variants which cause a spectrum of effects. Binary interpretation or thresholds for phenotypic categorisation within clinical guidelines may result in different treatment recommendations for patients who would ultimately have the same drug response. In addition, the safety, efficacy and cost-effectiveness of alternative treatments are important factors in assessing the overall health benefit to patients from pharmacogenomic testing.


Within clinical practice, the choice of technology for testing must be mapped to the clinical pathway to ensure that test results are available at an appropriate time to guide decision-making. Other key factors for clinical implementation include workforce education in pharmacogenomics, multidisciplinary pathway design, digital integration and tools to aid shared decision making (see Attitudes of clinicians following large-scale pharmacogenomics implementation, Pharmacogenomics J. 2016;16(4):393-8; Pharmacogenomics Implementation at the National Institutes of Health Clinical Center, J Clin Pharmacol. 2017;57 (Suppl 10):S67-S77; The implementation of pharmacogenomics into UK general practice: a qualitative study exploring barriers, challenges and opportunities, J Community Genet. 2020;11(3):269-277; Implementation of a multidisciplinary pharmacogenomics clinic in a community health system, Am J Health Syst Pharm. 2016;73(23):1956-66).


Large-scale international population studies of clinical utility in pharmacogenomics will contribute to understanding these real-world implementation factors, including studies underway with the U-PGx (see Implementing Pharmacogenomics in Europe: Design and Implementation Strategy of the Ubiquitous Pharmacogenomics Consortium, Clin Pharmacol Ther. 2017;101(3):341-58) and The IGNITE Pharmacogenetics Working Group: An Opportunity for Building Evidence with Pharmacogenetic Implementation in a Real-World Setting, Clin Transl Sci. 2017;10(3):143-6).


« Back