Replication of LDL GWAs hits in PROSPER/PHASE as validation for future (pharmaco)genetic analyses

Background The PHArmacogenetic study of Statins in the Elderly at risk (PHASE) is a genome wide association study in the PROspective Study of Pravastatin in the Elderly at risk for vascular disease (PROSPER) that investigates the genetic variation responsible for the individual variation in drug response to pravastatin. Statins lower LDL-cholesterol in general by 30%, however not in all subjects. Moreover, clinical response is highly variable and adverse effects occur in a minority of patients. In this report we first describe the rationale of the PROSPER/PHASE project and second show that the PROSPER/PHASE study can be used to study pharmacogenetics in the elderly. Methods The genome wide association study (GWAS) was conducted using the Illumina 660K-Quad beadchips following manufacturer's instructions. After a stringent quality control 557,192 SNPs in 5,244 subjects were available for analysis. To maximize the availability of genetic data and coverage of the genome, imputation up to 2.5 million autosomal CEPH HapMap SNPs was performed with MACH imputation software. The GWAS for LDL-cholesterol is assessed with an additive linear regression model in PROBABEL software, adjusted for age, sex, and country of origin to account for population stratification. Results Forty-two SNPs reached the GWAS significant threshold of p = 5.0e-08 in 5 genomic loci (APOE/APOC1; LDLR; FADS2/FEN1; HMGCR; PSRC1/CELSR5). The top SNP (rs445925, chromosome 19) with a p-value of p = 2.8e-30 is located within the APOC1 gene and near the APOE gene. The second top SNP (rs6511720, chromosome 19) with a p-value of p = 5.22e-15 is located within the LDLR gene. All 5 genomic loci were previously associated with LDL-cholesterol levels, no novel loci were identified. Replication in WOSCOPS and CARE confirmed our results. Conclusion With the GWAS in the PROSPER/PHASE study we confirm the previously found genetic associations with LDL-cholesterol levels. With this proof-of-principle study we show that the PROSPER/PHASE study can be used to investigate genetic associations in a similar way to population based studies. The next step of the PROSPER/PHASE study is to identify the genetic variation responsible for the variation in LDL-cholesterol lowering in response to statin treatment in collaboration with other large trials.


Background
Cardiovascular disease is the leading cause of death in industrialized countries at old age. Advancing age is one of the most important risk factors for cardiovascular disease [1]. With the rising number of elderly people in our society cardiovascular disease has a major impact on healthcare [2]. The prevention of cardiovascular disease is critically dependent on lipid lowering therapy including the 3-hydroxymethyl-3-methylglutaryl coenzyme A (HMG-CoA) reductase inhibitors (statins). Statins are the most prescribed class of drugs worldwide and therapy is generally associated with a reduction of cardiovascular events by 20-30%. However, clinical response is highly variable and adverse effects occur in a minority of patients [3]. Recent research provides evidence that genetic variation contributes importantly to this variable drug response [4].
Pharmacogenomics focuses on unraveling the genetic determinants of such variable drug responses, both in intended, beneficial effects and unintended, adverse effects [5]. Therefore, we here present the PHArmacogenetic study of Statin in the Elderly at risk (PHASE) a genome wide association study (GWAS) in the PROspective Study of Pravastatin in the Elderly at Risk for vascular disease (PROSPER) [6] investigating the genetic variation responsible for the individual variation in drug response funded by the European Union's Seventh Framework Programme. To validate the GWAS performed in the PHASE study, we executed a proof-of-principle study to investigate the underlying genetic variation in LDL cholesterol levels.
Recent GWA studies have identified several new loci that influence circulating levels of blood lipids with around 95 loci showing statistical associations with circulating total cholesterol levels, HDL cholesterol, LDL cholesterol, and triglycerides [7]. These GWA studies are executed in population based studies with various age groups, however the elderly (age > 75 years) are rarely represented in these studies. With this proof-of-principle study we provide a testing frame to show that the PROS-PER/PHASE study has sufficient statistical power to find genome wide statistical significant associations in quantitative traits such as LDL cholesterol in an elderly population. We replicated our findings from the PROSPER/ PHASE study in two independent cohorts to validate that our results contain no false positive findings.

Study population
PROSPER was an investigator-driven, prospective multinational randomized placebo-controlled trial to assess whether treatment with pravastatin diminishes the risk of major vascular events in the elderly [6;8]. Between December 1997 and May 1999, we screened and enrolled subjects in Scotland, Ireland, and the Netherlands. Men and women aged 70-82 years were recruited if they had pre-existing vascular disease or were at increased risk of such disease because of smoking, hypertension, or diabetes. A total number of 5804 subjects, of whom more than 50% was female, were randomly assigned to pravastatin or placebo. Various clinical laboratory measurements were carried out like inflammatory markers (CRP and various cytokines) and other biochemical substrates (e.g. glucose, leptin) at baseline and during follow-up. The protocol of the PROSPER study meets the criteria of the Declaration of Helsinki and was approved by the Medical Ethics Committees of each participating institution. Written informed consent was obtained from all participating subjects.

LDL cholesterol
Plasma lipids and lipoproteins were measured twice during the screening phase, i.e. at the beginning and end of the single-blind, placebo "run-in" phase according to the standardized Lipid Research Clinics protocol. Baseline LDL cholesterol levels were taken as the average of these 2 determinations prior to randomization to statin treatment. Total cholesterol (TC), HDL cholesterol, and triglycerides were assessed after an overnight fast, LDL cholesterol was calculated by the Friedewald formula, as previously described [8].

Genotyping
The genotyping was conducted using the Illumina 660-Quad beadchips following manufacturer's instructions. These beadchips contain 657,366 single nucleotide polymorphism (SNP) and copy number variants (CNV) probes. After genotyping, samples and genetic markers were subjected to a stringent quality control protocol. From the 5763 samples with DNA available that underwent genotyping, 519 samples (9%) were excluded during the quality control ( Figure 1). Excluded were 18 duplicated samples, 219 samples with a call rate < 97.5%, 11 samples with an excess for heterozygosity, 40 samples of non-caucasian origin, 170 samples with familiar relationships (IBD > 0.35), and 61 samples with a gender mismatch. From the 657,366 probes on the beadchips, 95,876 probes were filtered based on CNV intensity. Moreover, 4,298 SNPs were excluded with a call rate < 95%, leaving us with 557,192 SNPs for analysis. To maximize the availability of genetic data and coverage of the genome, imputation up to 2.5 million autosomal CEPH HapMap SNPs was performed with MACH imputation software based on the Hapmap built II release 23. To assess accuracy of the imputed genotypes, we compared the imputation output with SNPs that had been previously genotyped on other platforms.

Statistical Analysis
Genome wide association analysis was performed with PROBABEL software specialized in genetic association analysis with imputed data taking the probability of the genotype into account (http://www.genabel.org/). With analyzing imputed genotypes, the observed allele count is replaced by the imputation's estimated dosage. For the continuous trait, baseline LDL cholesterol levels, an additive linear regression model was used to assess estimates and standard errors. The model was adjusted for sex and age, and country to correct for the within-study population structure. Standard errors for the regression estimates were calculated with model-robust methods.
The analysis of 2.5 million SNPs at once poses a multiple testing problem. After the use of a Bonferroni correction, the threshold for genome wide significant results was set at 5.0e-08.

Replication
Associations with a genome-wide significant p-value of 5.0e-08 were replicated in two independent cohorts, the West of Scotland Coronary Prevention Study  (WOSCOPS) [9] and the Cholesterol and Recurrent Events (CARE) trial [10]. The WOSCOPS study was a double blind randomized placebo-controlled clinical trial in which 6595 men (age range 45-64 years)with hypercholesterolemia and no history of myocardial infarction were treated with 40 mg pravastatin (N = 3302) or placebo (N = 3293). GWAS data and baseline LDL cholesterol levels were available for 431 subjects. The CARE study was a double blind randomized placebo-controlled clinical trial in which 4159 patients (age range 21-75 years) were treated with 40 mg pravastatin (N = 2081) or placebo (N = 2078). GWAS data and baseline LDL cholesterol levels were available for 751 subjects. The significance level for the replication SNPs was set at p-value < 0.05. Table 1 shows the baseline characteristics of the subjects participating in the PROSPER and the PROSPER/ PHASE study. This table shows that the genotyped subjects in the PROSPER/PHASE study are representative of the total study population of the PROSPER study, since no major discrepancies exist between the two study sets. The mean age of all subjects at study entry was 75.3 years and about 50% of the participants were female.

Results
In Figure 2 the QQ-plot of the genome-wide association study with baseline LDL levels within the PROS-PER/PHASE study is shown. In this plot it is shown that no genomic inflation has occurred in this analyses   (lambda = 1.077) and that population stratification is sufficiently controlled for. In Figure 3 the results of the genome-wide association study with baseline LDL cholesterol levels within the PROSPER/PHASE study are depicted in a Manhattan plot. Forty-two SNPs in five genomic loci, APOE/APOC1, LDLR, FADS2/FEN1, HMGCR, and PSRC1/CELSR5, reached the genomewide significant p-value of 5.0e-08. In table 2 a summary of the five genomic loci and their corresponding SNPs is given. The top SNP (rs445925, Chr. 19) with a p-value of p = 2.8e-30 is located within the APOC1 gene and near the APOE gene. Sixteen other SNPs in the same genomic region were also found to be associated with LDL cholesterol levels. The second top SNP (rs6511720, Chr. 19) with a p-value of p = 5.22e-15 is located within the LDLR gene. The three other genomic regions included the HMGCR (Chr.5), FADS2/FEN1 (Chr. 11), PSRC1/CELSR5 (Chr. 1) genes. All 5 genomic loci were previously found in association with LDL cholesterol levels and no novel loci were identified. We replicated the positive associations with genomewide significant p-values in two independent cohorts, the WOSCOPS study and the CARE trial (table 3). Of our five genomic loci that were significantly associated with baseline LDL cholesterol levels we selected the top SNP for replication in both replication cohorts. If the SNP was not genotyped in their GWAS analysis, we chose a proxy in high linkage disequilibrium (r2 > 0.5%) for that SNP. These SNPs were associated with baseline LDL levels before randomisation to statin treatment in both studies. Three out of the five loci (APOE/APOC1; HMGCR; PSRC1/CELSR5) replicated in one or two replication cohorts (p < 0.05). The two other loci (LDLR and FADS2/FEN1) showed similar trends as shown in the discovery cohort, although they did not reach statistical significance (table 3).

Discussion
With this first proof-of principle study we show that the PROSPER/PHASE GWAS can confirm previously found genetic associations with LDL-cholesterol levels. This proof-of-principle study indicates that the PROSPER/ PHASE study is likely to be capable of detecting genomic regions responsible for the variation in various other quantitative traits. With almost 6000 samples in the PROSPER/PHASE study and access to various replication studies, the PROSPER/PHASE study can provide a good testing frame to identify the genetic variation responsible for the variation in LDL-cholesterol lowering in response to statin treatment.
The main locus responsible for the person-to-person variation in LDL-cholesterol levels is the chromosome 19 locus, which contains the APOE, APOC1, and LDLR genes. Other important loci included the HMGCR locus on chromosome 5, FADS2/FEN1 locus on chromosome 11, and the PSRC1/CELSR5 locus on chromosome 1. The five genomic loci that were associated with variation in LDL-cholesterol levels in the PHASE GWAS study were all genomic regions that were previously

Conclusions
With this proof-of-principle study we show that the PROSPER/PHASE study can be used to investigate genetic associations in a similar way to population based studies. Moreover, we can also assume from these results that the PROSPER/PHASE study is likely to have sufficient power to detect genome-wide significant hits with large effects for other quantitative traits. The next step of the PROSPER/PHASE study is to identify the genetic variation responsible for the variation in LDLcholesterol lowering in response to statin treatment. * A proxy for this SNP was used in both replication cohorts, for 1 the proxy SNP was rs7715806 with a r 2 of 0.93, for 2 the proxy SNP was rs174545 with a r 2 of 0.90, and for 3 the proxy SNP was rs660240 with a r 2 of 0.88. Abbreviations: SNP, Single Nucleotide Polymorphism; Chr, Chromosome.