A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis

Background Venous Thrombosis (VT) is a common multifactorial disease with an estimated heritability between 35% and 60%. Known genetic polymorphisms identified so far only explain ~5% of the genetic variance of the disease. This study was aimed to investigate whether pair-wise interactions between common single nucleotide polymorphisms (SNPs) could exist and modulate the risk of VT. Methods A genome-wide SNP x SNP interaction analysis on VT risk was conducted in a French case–control study and the most significant findings were tested for replication in a second independent French case–control sample. The results obtained in the two studies totaling 1,953 cases and 2,338 healthy subjects were combined into a meta-analysis. Results The smallest observed p-value for interaction was p = 6.00 10-11 but it did not pass the Bonferroni significance threshold of 1.69 10-12 correcting for the number of investigated interactions that was 2.96 1010. Among the 37 suggestive pair-wise interactions with p-value less than 10-8, one was further shown to involve two SNPs, rs9804128 (IGFS21 locus) and rs4784379 (IRX3 locus) that demonstrated significant interactive effects (p = 4.83 10-5) on the variability of plasma Factor VIII levels, a quantitative biomarker of VT risk, in a sample of 1,091 VT patients. Conclusion This study, the first genome-wide SNP interaction analysis conducted so far on VT risk, suggests that common SNPs are unlikely exerting strong interactive effects on the risk of disease.


Background
Venous Thrombosis (VT) is a common complex disease affecting~0.2% of individuals a year. VT includes deep vein thrombosis and pulmonary embolism, the latter being characterized by a one year mortality rate of 10% excluding patients with malignancies [1]. As a complex trait, VT is considered as resulting from the interplay between environmental and genetic factors, that could interact with each other, to modulate VT risk [2,3]. The recent Genome Wide Association Studies (GWAS) strategy brought great hopes to identify novel susceptibility loci to human diseases and some true successes were obtained in the field of VT genetics. Novel genes recently identified to harbor common susceptibility alleles (i.e. with allele frequency > 0.05) for VT include GP6, HIVEP1, KNG1, STAB2, STXBP5 and VWF (reviewed in [4]). However, none of the identified risk alleles demonstrated genetic effects stronger than those of the established VT-associated genes known before the GWAS era, ABO, F2, F5 and FGG [5]. As for most multifactorial diseases, risk alleles for VT identified so far only explain a small proportion of the familial risk of disease [6]. Alternative strategies are needed to identify the army sources that could contribute to the unexplained heritability and these include gene-gene and geneenvironment interactions, deep sequencing, transcriptomic analyses and epigenomics [7][8][9][10].
In this work, we were interested in assessing whether interaction between common polymorphisms could contribute to VT risk. To our knowledge, studies that have investigated this hypothesis were mainly dedicated to known candidate genes [11,12] and no attempt has been made to address it without any a priori hypothesis. This is why, we here take advantage of the large amount of genetic information we have collected through two French GWAS on VT [6,13] to conduct the first genome-wide search for SNP x SNP interaction with respect to VT risk.

Methods
This work was based on two French GWAS on VT, the Early-Onset Venous Thrombosis (EOVT) and the Marseille Thrombosis Association (MARTHA) studies. These two studies have already been extensively described in [5,6,14] for EOVT and in [6,[15][16][17] for MARTHA.

Ethical approval
Each individual study was approved by its institutional ethics committee and informed written consent was obtained in accordance with the Declaration of Helsinki. Ethics approval were obtained from the "Departement santé de la direction générale de la recherche et de l'innovation du ministère" (Projects DC: 2008-880 & 09.576) and from the institutional ethics committees of the Kremlin-Bicetre Hospital.

Studied populations and phenotype measurements
Briefly, in both studies, VT patients were cases, with a documented history of VT and free of well known strong genetic risk factors including antithrombin (AT), protein C (PC) or protein S (PS) deficiency, homozygosity for FV Leiden or F2 20210A mutations and lupus anticoagulant. In EOVT, patients were selected to experience idiopathic VT before the age of 50. Controls were French individuals selected from two healthy populations, SUVIMAX [18] and the Three City Study [19], for EOVT and MARTHA, respectively. The EOVT casecontrol study included 411 patients and 1,228 healthy subjects, while MARTHA was composed of 1,542 patients and 1,110 healthy subjects, all the individuals being of European origin, with the majority being of French descent. A summary of the population characteristics is provided in Additional file 1.
Several key quantitative biomarkers of VT risk have been measured in MARTHA patients. The detailed description of the corresponding measurements has been previously described in [15] for AT, PC, PS and the agkistrodon contortrix venom (ACV) test that explores the PC pathway, in [17] for Factor VIII (FVIII) and von Willebrand Factor (VWF), and in [16] for Activated Partial Thromboplastin Time (aPTT) and Prothrombin Time (PT).

Genotyping
Individuals participating in the EOVT study were genotyped for 317,139 SNPs using the Illumina Sentrix HumanHap300 Beadchip. The application of the quality control criteria described in [5] led the final selection of 291,872 autosomal SNPs for analysis. As detailed in [6], individuals participating to the MARTHA GWAS were typed with the Illumina Human 610-Quad and Human660W-Quad Beadchips. 481,002 autosomal SNPs remained for analysis after quality control.

Statistical analysis
Our search for genome wide interactions was conducted in two steps. A first screening for pairwise SNPs interactions was carried out in the EOVT study. The first part of this discovery screening consisted in reducing redundancy between SNPs by keeping only one SNP out of all SNPs in strong pairwise linkage disequilibrium (r 2 > 0.90) within a window of 50 kb. Pairwise SNPs interactions were then tested by a logistic regression analysis where both SNPs were coded under an additive model (0,1 and 2 according to the number of rare alleles) and an interaction term was added in the model. For this, we used the plink software [20]. All interactions significant at p < 10 -4 were further assessed at a second step in the larger MARTHA study. When SNPs were not available in the latter sample, the best available proxy in term of r 2 , according to the SNAP database [21], was used. The same logistic regression model was applied in the MARTHA study. Results obtained in the two GWAS were then meta-analyzed through a fixed-effect model relying on the inverse-variance weighting as implemented in the METAL software (http://www.sph.umich.edu/csg/ abecasis/metal). Homogeneity of associations across the two GWAS studies was tested using the Mantel-Haenszel method [22].
The most significant interactions were then further assessed in relation to quantitative biomarkers of VT risk in MARTHA patients. For this, standard linear regression analyses were conducted with the same additive allele coding as for the binary trait analysis. Analyses were adjusted for age, sex and ABO blood group. For AT, PC, PS and ACV, individuals under anticoagulant were excluded. The THESIAS software [23] was used to illustrate the detected pairwise SNP interactions.

Results and discussion
We first applied a pairwise tagging approach to discard redundant SNPs using a r 2 threshold of 0.90, that led to the final selection of 243,189 SNPs from the EOVT study.
2.96 10 10 pairwise SNPs interactions were then tested in EOVT, but none of them reached the Bonferroni corrected p-value of 1.69 10 -12 . Nevertheless, all interactions with p-value less than 10 -4 (n = 2,126,084) were further assessed in MARTHA. The smallest observed p-value was 6.73 10 -7 , but it did not pass the Bonferroni correction (p < 2.35 10 -8 ) for the number of interactions tested at this second step. The meta-analysis of the results obtained in EOVT and MARTHA led to 37 suggestive interactions with p-values lower than 10 -8 and with consistent effects in both studies (Table 1). The smallest one, p = 6.00 10 -11 , was observed for two SNPs in the vicinity of SURF6 gene that is~40 kb from the ABO locus. After adjusting for the ABO blood group, this interaction vanished (p = 0.37) suggesting that this interaction had captured the ABO effect through the linkage disequilibrium extending at this locus.
Despite the lack of study-wise statistical interactions, we could not exclude that some genuine interaction phenomena hide in the list of suggestive interactions ( Table 1). We hypothesized that the use of additional biological information on quantitative biomarkers of VT risk could help in digging into this list. We therefore investigated whether the identified interactive SNPs could exert their effect on VT biomarkers available in MARTHA: ACV, aPTT, AT, Fibrinogen, FVIII, PC, PS, PT and VWF. At the Bonferroni threshold of 1.50 10 -4 for the number of performed tests (i.e. 333 = 37 SNPs x 9 phenotypes ), one interaction was statistically significant (p = 4.82 10 -5 ). It involved rs9804128 lying in the promoter region of the IGSF21 gene and the rs4784379 mapping 130 kb downstream the IRX3 locus, the two SNPs interacting to modulate plasma FVIII levels. As shown in Table 2, carriers of the rs9804128-G and rs4784379-A alleles were associated with the highest plasma FVIII levels compared to the three other alleles combinations. At contrast, these individuals were associated with~2 fold decreased in VT risk, the frequency of the GA combination being 8.3% in controls and 4.6% in patients ( Table 2). Looking deeply to the diplotypes formed by these two SNPs revealed that patients carrying without any ambiguity the GA combination, ie those carrying either the rs9804128-GG genotype and the rs4784379-A allele or the rs9804128-GA genotype and the rs4784379-AA genotype, exhibited the highest plasma FVIII levels ( Table 3). Individuals ambiguous for the GA combination, who are those heterozygotes at both rs9804128 and rs4784379, were at intermediate FVIII levels (Table 3).
To our knowledge, this work is the first attempt in the field of VT genetics to investigate, at the genome-wide scale, the presence of interactive effects derived from common SNPs. This study did not detect interactions that reached the Bonferroni correction for the number of investigated interactions. The absence of such interaction could of course be due to low power. According to the distributions of the minor allele frequencies and the marginal allelic effects observed in the EOVT study, we computed the minimum OR for interaction that could be detectable with a 80% power [24,25]. These calculations suggest that our discovery cohort was only well powered to detect interactive ORs greater than 2.8 at the genome-wide statistical level of 1.69 10 -12 and ORs greater than 1.8 at the p <10 -4 threshold [Additional file 2]. The power to detect in our second sample the most significant observed interactions was about 50% [24,25]. As a consequence, despite the use of two large GWAS datasets on VT, this study was not powerful enough to detect interactions between common SNPS characterized by interactive ORs smaller than~2. p (2) = 2.73 10 -5 p (2) = 9.45 10 -6 p (3) = 1.90 10 -9 p (4) = 6.89 10 -5 (1) In MARTHA, 1091 patients were measured for FVIII levels. (2) p-value of the interaction term between the two SNPs in the logistic regression analysis under the assumption of additive allele effects. (3) p-value obtained from the meta-analysis of the EOVT and MARTHA samples using a fixed-effect model. (4) p-value of the interaction term between the two SNPs in the linear regression analysis, adjusted for age, sex, ABO blood group and F5/F2 carriers mutations. There is still no consensus about the most efficiency way to perform a genome-wide search for SNP x SNP interaction. A plethora of statistical methods are applicable to the detection of such interactions eg [8,[26][27][28][29] and none of them could be considered as the panacea. Comparing the performances of different methodologies is of great importance but out of the scope of this manuscript. We rather focused in the present work on the application of a standard methodology, the logistic regression model, that has been shown to be a valid methodology for detecting interaction between SNPs [8]. Different strategies can still be adopted within the logistic regression framework. Some people advocate to restrict the search for interaction to the set of most "significant" SNPs observed in single locus analysis. However, in that case, which statistical threshold should be used for selecting SNPs with significant marginal associations? Nevertheless, we further confined our search for interaction to SNPs with statistical evidence for association in univariate analysis as low as p < 10 -3 or p < 0.05. We did not identify pair-wise significant interaction that were homogeneous between EOVT and MARTHA, and that satisfied the relevant Bonferroni correction (data not shown). Others suggest to use external biological information to refine the research strategy. Pathwaybased analysis focusing only on the pairwise interactions between candidate gene SNPs could be such a strategy. By focusing only on SNPs mapping the VT candidate genes listing in the Supplementary Table 1 in [6], we did not detect any Bonferroni-corrected significant interaction that replicate in the EOVT and MAR-THA study (data not shown). Another possibility could consist in assessing whether the most promising interactive effects could also be observed on quantitative traits known to be associated with the disease. Doing so, we observed that the rs9804128 and rs4784379 could interact to modulate both the risk of VT and the variability of FVIII levels. The rs9804128 lies in the proximal promoter of the IGFS21 gene and, according to the SNAP database [21], it is not in strong LD (r 2 > 0.8) with any other SNP. Conversely, the rs4784379 is in strong LD with several SNPs, all located at least 100 kb away from the IRX3 locus. However, the observed interaction could be considered as counterintuitive since the allele combination associated with increased FVIII levels was found less frequent in cases than in controls. This phenomenon could nevertheless be observed in presence of a mortality bias when patients with high levels of FVIII levels are at a higher risk of VT-associated mortality (eg. pulmonary embolism) and then under-represented in the cases sample. Further investigations are needed to replicate this association that involved SNPs at genes on which very little is known with respect to VT.