Research article | Open | Open Peer Review | Published:
A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis
BMC Medical Geneticsvolume 14, Article number: 36 (2013)
Venous Thrombosis (VT) is a common multifactorial disease with an estimated heritability between 35% and 60%. Known genetic polymorphisms identified so far only explain ~5% of the genetic variance of the disease. This study was aimed to investigate whether pair-wise interactions between common single nucleotide polymorphisms (SNPs) could exist and modulate the risk of VT.
A genome-wide SNP x SNP interaction analysis on VT risk was conducted in a French case–control study and the most significant findings were tested for replication in a second independent French case–control sample. The results obtained in the two studies totaling 1,953 cases and 2,338 healthy subjects were combined into a meta-analysis.
The smallest observed p-value for interaction was p = 6.00 10-11 but it did not pass the Bonferroni significance threshold of 1.69 10-12 correcting for the number of investigated interactions that was 2.96 1010. Among the 37 suggestive pair-wise interactions with p-value less than 10-8, one was further shown to involve two SNPs, rs9804128 (IGFS21 locus) and rs4784379 (IRX3 locus) that demonstrated significant interactive effects (p = 4.83 10-5) on the variability of plasma Factor VIII levels, a quantitative biomarker of VT risk, in a sample of 1,091 VT patients.
This study, the first genome-wide SNP interaction analysis conducted so far on VT risk, suggests that common SNPs are unlikely exerting strong interactive effects on the risk of disease.
Venous Thrombosis (VT) is a common complex disease affecting ~0.2% of individuals a year. VT includes deep vein thrombosis and pulmonary embolism, the latter being characterized by a one year mortality rate of ~10% excluding patients with malignancies . As a complex trait, VT is considered as resulting from the interplay between environmental and genetic factors, that could interact with each other, to modulate VT risk [2, 3]. The recent Genome Wide Association Studies (GWAS) strategy brought great hopes to identify novel susceptibility loci to human diseases and some true successes were obtained in the field of VT genetics. Novel genes recently identified to harbor common susceptibility alleles (i.e. with allele frequency > 0.05) for VT include GP6, HIVEP1, KNG1, STAB2, STXBP5 and VWF (reviewed in ). However, none of the identified risk alleles demonstrated genetic effects stronger than those of the established VT-associated genes known before the GWAS era, ABO, F2, F5 and FGG. As for most multifactorial diseases, risk alleles for VT identified so far only explain a small proportion of the familial risk of disease . Alternative strategies are needed to identify the army sources that could contribute to the unexplained heritability and these include gene-gene and gene-environment interactions, deep sequencing, transcriptomic analyses and epigenomics [7–10].
In this work, we were interested in assessing whether interaction between common polymorphisms could contribute to VT risk. To our knowledge, studies that have investigated this hypothesis were mainly dedicated to known candidate genes [11, 12] and no attempt has been made to address it without any a priori hypothesis. This is why, we here take advantage of the large amount of genetic information we have collected through two French GWAS on VT [6, 13] to conduct the first genome-wide search for SNP x SNP interaction with respect to VT risk.
This work was based on two French GWAS on VT, the Early-Onset Venous Thrombosis (EOVT) and the Marseille Thrombosis Association (MARTHA) studies. These two studies have already been extensively described in [5, 6, 14] for EOVT and in [6, 15–17] for MARTHA.
Each individual study was approved by its institutional ethics committee and informed written consent was obtained in accordance with the Declaration of Helsinki. Ethics approval were obtained from the “Departement santé de la direction générale de la recherche et de l’innovation du ministère” (Projects DC: 2008-880 & 09.576) and from the institutional ethics committees of the Kremlin-Bicetre Hospital.
Studied populations and phenotype measurements
Briefly, in both studies, VT patients were cases, with a documented history of VT and free of well known strong genetic risk factors including antithrombin (AT), protein C (PC) or protein S (PS) deficiency, homozygosity for FV Leiden or F2 20210A mutations and lupus anticoagulant. In EOVT, patients were selected to experience idiopathic VT before the age of 50. Controls were French individuals selected from two healthy populations, SUVIMAX  and the Three City Study , for EOVT and MARTHA, respectively. The EOVT case–control study included 411 patients and 1,228 healthy subjects, while MARTHA was composed of 1,542 patients and 1,110 healthy subjects, all the individuals being of European origin, with the majority being of French descent. A summary of the population characteristics is provided in Additional file 1.
Several key quantitative biomarkers of VT risk have been measured in MARTHA patients. The detailed description of the corresponding measurements has been previously described in  for AT, PC, PS and the agkistrodon contortrix venom (ACV) test that explores the PC pathway, in  for Factor VIII (FVIII) and von Willebrand Factor (VWF), and in  for Activated Partial Thromboplastin Time (aPTT) and Prothrombin Time (PT).
Individuals participating in the EOVT study were genotyped for 317,139 SNPs using the Illumina Sentrix HumanHap300 Beadchip. The application of the quality control criteria described in  led the final selection of 291,872 autosomal SNPs for analysis. As detailed in , individuals participating to the MARTHA GWAS were typed with the Illumina Human 610-Quad and Human660W-Quad Beadchips. 481,002 autosomal SNPs remained for analysis after quality control.
Our search for genome wide interactions was conducted in two steps. A first screening for pairwise SNPs interactions was carried out in the EOVT study. The first part of this discovery screening consisted in reducing redundancy between SNPs by keeping only one SNP out of all SNPs in strong pairwise linkage disequilibrium (r2 > 0.90) within a window of 50 kb. Pairwise SNPs interactions were then tested by a logistic regression analysis where both SNPs were coded under an additive model (0,1 and 2 according to the number of rare alleles) and an interaction term was added in the model. For this, we used the plink software . All interactions significant at p < 10-4 were further assessed at a second step in the larger MARTHA study. When SNPs were not available in the latter sample, the best available proxy in term of r2, according to the SNAP database , was used. The same logistic regression model was applied in the MARTHA study. Results obtained in the two GWAS were then meta-analyzed through a fixed-effect model relying on the inverse-variance weighting as implemented in the METAL software (http://www.sph.umich.edu/csg/abecasis/metal). Homogeneity of associations across the two GWAS studies was tested using the Mantel-Haenszel method .
The most significant interactions were then further assessed in relation to quantitative biomarkers of VT risk in MARTHA patients. For this, standard linear regression analyses were conducted with the same additive allele coding as for the binary trait analysis. Analyses were adjusted for age, sex and ABO blood group. For AT, PC, PS and ACV, individuals under anticoagulant were excluded. The THESIAS software  was used to illustrate the detected pairwise SNP interactions.
Results and discussion
We first applied a pairwise tagging approach to discard redundant SNPs using a r2 threshold of 0.90, that led to the final selection of 243,189 SNPs from the EOVT study.
2.96 1010 pairwise SNPs interactions were then tested in EOVT, but none of them reached the Bonferroni corrected p-value of 1.69 10-12. Nevertheless, all interactions with p-value less than 10-4 (n = 2,126,084) were further assessed in MARTHA. The smallest observed p-value was 6.73 10-7, but it did not pass the Bonferroni correction (p < 2.35 10-8) for the number of interactions tested at this second step.
The meta-analysis of the results obtained in EOVT and MARTHA led to 37 suggestive interactions with p-values lower than 10-8 and with consistent effects in both studies (Table 1). The smallest one, p = 6.00 10-11, was observed for two SNPs in the vicinity of SURF6 gene that is ~40 kb from the ABO locus. After adjusting for the ABO blood group, this interaction vanished (p = 0.37) suggesting that this interaction had captured the ABO effect through the linkage disequilibrium extending at this locus.
Despite the lack of study-wise statistical interactions, we could not exclude that some genuine interaction phenomena hide in the list of suggestive interactions (Table 1). We hypothesized that the use of additional biological information on quantitative biomarkers of VT risk could help in digging into this list. We therefore investigated whether the identified interactive SNPs could exert their effect on VT biomarkers available in MARTHA: ACV, aPTT, AT, Fibrinogen, FVIII, PC, PS, PT and VWF. At the Bonferroni threshold of 1.50 10-4 for the number of performed tests (i.e. 333 = 37 SNPs x 9 phenotypes ), one interaction was statistically significant (p = 4.82 10-5). It involved rs9804128 lying in the promoter region of the IGSF21 gene and the rs4784379 mapping 130 kb downstream the IRX3 locus, the two SNPs interacting to modulate plasma FVIII levels. As shown in Table 2, carriers of the rs9804128-G and rs4784379-A alleles were associated with the highest plasma FVIII levels compared to the three other alleles combinations. At contrast, these individuals were associated with ~2 fold decreased in VT risk, the frequency of the GA combination being 8.3% in controls and 4.6% in patients (Table 2). Looking deeply to the diplotypes formed by these two SNPs revealed that patients carrying without any ambiguity the GA combination, ie those carrying either the rs9804128-GG genotype and the rs4784379-A allele or the rs9804128-GA genotype and the rs4784379-AA genotype, exhibited the highest plasma FVIII levels (Table 3). Individuals ambiguous for the GA combination, who are those heterozygotes at both rs9804128 and rs4784379, were at intermediate FVIII levels (Table 3).
To our knowledge, this work is the first attempt in the field of VT genetics to investigate, at the genome-wide scale, the presence of interactive effects derived from common SNPs. This study did not detect interactions that reached the Bonferroni correction for the number of investigated interactions. The absence of such interaction could of course be due to low power. According to the distributions of the minor allele frequencies and the marginal allelic effects observed in the EOVT study, we computed the minimum OR for interaction that could be detectable with a 80% power [24, 25]. These calculations suggest that our discovery cohort was only well powered to detect interactive ORs greater than 2.8 at the genome-wide statistical level of 1.69 10-12 and ORs greater than 1.8 at the p <10-4 threshold [Additional file 2]. The power to detect in our second sample the most significant observed interactions was about 50% [24, 25]. As a consequence, despite the use of two large GWAS datasets on VT, this study was not powerful enough to detect interactions between common SNPS characterized by interactive ORs smaller than ~2.
There is still no consensus about the most efficiency way to perform a genome-wide search for SNP x SNP interaction. A plethora of statistical methods are applicable to the detection of such interactions eg [8, 26–29] and none of them could be considered as the panacea. Comparing the performances of different methodologies is of great importance but out of the scope of this manuscript. We rather focused in the present work on the application of a standard methodology, the logistic regression model, that has been shown to be a valid methodology for detecting interaction between SNPs . Different strategies can still be adopted within the logistic regression framework. Some people advocate to restrict the search for interaction to the set of most “significant” SNPs observed in single locus analysis. However, in that case, which statistical threshold should be used for selecting SNPs with significant marginal associations? Nevertheless, we further confined our search for interaction to SNPs with statistical evidence for association in univariate analysis as low as p < 10-3 or p < 0.05. We did not identify pair-wise significant interaction that were homogeneous between EOVT and MARTHA, and that satisfied the relevant Bonferroni correction (data not shown). Others suggest to use external biological information to refine the research strategy. Pathway-based analysis focusing only on the pairwise interactions between candidate gene SNPs could be such a strategy. By focusing only on SNPs mapping the VT candidate genes listing in the Supplementary Table 1 in , we did not detect any Bonferroni-corrected significant interaction that replicate in the EOVT and MARTHA study (data not shown). Another possibility could consist in assessing whether the most promising interactive effects could also be observed on quantitative traits known to be associated with the disease. Doing so, we observed that the rs9804128 and rs4784379 could interact to modulate both the risk of VT and the variability of FVIII levels. The rs9804128 lies in the proximal promoter of the IGFS21 gene and, according to the SNAP database , it is not in strong LD (r2 > 0.8) with any other SNP. Conversely, the rs4784379 is in strong LD with several SNPs, all located at least 100 kb away from the IRX3 locus. However, the observed interaction could be considered as counterintuitive since the allele combination associated with increased FVIII levels was found less frequent in cases than in controls. This phenomenon could nevertheless be observed in presence of a mortality bias when patients with high levels of FVIII levels are at a higher risk of VT-associated mortality (eg. pulmonary embolism) and then under-represented in the cases sample. Further investigations are needed to replicate this association that involved SNPs at genes on which very little is known with respect to VT.
In conclusion, our work suggests that strong interactive (~OR >2) phenomena between common SNPs are unlikely to contribute much to the risk of the VT.
Written informed consent was obtained from the patient for publication of this report and any accompanying images.
White RH: The epidemiology of venous thromboembolism. Circulation. 2003, 107: I4-I8.
Rosendaal FR: Venous thrombosis: a multicausal disease. Lancet. 1999, 353: 1167-1173. 10.1016/S0140-6736(98)10266-0.
Souto JC, Almasy L, Borrell M, Blanco-Vaca F, Mateo J, Soria JM, Coll I, Felices R, Stone W, Fontcuberta J, Blangero J: Genetic susceptibility to thrombosis and its relationship to physiological risk factors: the GAIT study. Genetic Analysis of Idiopathic Thrombophilia. Am J Hum Genet. 2000, 67: 1452-1459. 10.1086/316903.
Morange PE, Tregouet DA: Lessons from genome-wide association studies in venous thrombosis. J Thromb Haemost. 2011, 9 (Suppl 1): 258-264.
Tregouet DA, Heath S, Saut N, Biron-Andreani C, Schved JF, Pernod G, Galan P, Drouet L, Zelenika D, Juhan-Vague I: Common susceptibility alleles are unlikely to contribute as strongly as the FV and ABO loci to VTE risk: results from a GWAS approach. Blood. 2009, 113: 5298-5303. 10.1182/blood-2008-11-190389.
Germain M, Saut N, Greliche N, Dina C, Lambert JC, Perret C, Cohen W, Oudot-Mellakh T, Antoni G, Alessi MC: Genetics of venous thrombosis: insights from a new genome wide association study. PLoS One. 2011, 6: e25581-10.1371/journal.pone.0025581.
Morange PE, Tregouet DA: Deciphering the molecular basis of venous thromboembolism: where are we and where should we go?. Br J Haematol. 2010, 148: 495-506. 10.1111/j.1365-2141.2009.07975.x.
Cordell HJ: Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009, 10: 392-404.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A: Finding the missing heritability of complex diseases. Nature. 2009, 461: 747-753. 10.1038/nature08494.
Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2011, 11: 446-450.
Auro K, Alanne M, Kristiansson K, Silander K, Kuulasmaa K, Salomaa V, Peltonen L, Perola M: Combined effects of thrombosis pathway gene variants predict cardiovascular events. PLoS Genet. 2007, 3: e120-10.1371/journal.pgen.0030120.
Pomp ER, Doggen CJ, Vos HL, Reitsma PH, Rosendaal FR: Polymorphisms in the protein C gene as risk factor for venous thrombosis. Thromb Haemost. 2009, 101: 62-67.
Tregouet DA, Konig IR, Erdmann J, Munteanu A, Braund PS, Hall AS, Grosshennig A, Linsel-Nitschke P, Perret C, DeSuremain M: Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease. Nat Genet. 2009, 41: 283-285. 10.1038/ng.314.
Smith NL, Heit JA, Tang W, Teichert M, Chasman DI, Morange PE: Genetic variation in F3 (tissue factor) and the risk of incident venous thrombosis: meta-analysis of eight studies. J Thromb Haemost. 2012, 10: 719-722. 10.1111/j.1538-7836.2012.04665.x.
Oudot-Mellakh T, Cohen W, Germain M, Saut N, Kallel C, Zelenika D, Lathrop M, Tregouet DA, Morange PE: Genome wide association study for plasma levels of natural anticoagulant inhibitors and protein C anticoagulant pathway: the MARTHA project. Br J Haematol. 2012, 157: 230-239. 10.1111/j.1365-2141.2011.09025.x.
Tang W, Schwienbacher C, Lopez LM, Ben-Shlomo Y, Oudot-Mellakh T, Johnson AD, Samani NJ, Basu S, Gogele M, Davies G: Genetic Associations for Activated Partial Thromboplastin Time and Prothrombin Time, their Gene Expression Profiles, and Risk of Coronary Artery Disease. Am J Hum Genet. 2012, 91: 152-162.
Antoni G, Oudot-Mellakh T, Dimitromanolakis A, Germain M, Cohen W, Wells P, Lathrop M, Gagnon F, Morange PE, Tregouet DA: Combined analysis of three genome-wide association studies on vWF and FVIII plasma levels. BMC Med Genet. 2011, 12: 102-
Hercberg S, Galan P, Preziosi P, Bertrais S, Mennen L, Malvy D, Roussel AM, Favier A, Briancon S: The SU.VI.MAX Study: a randomized, placebo-controlled trial of the health effects of antioxidant vitamins and minerals. Arch Intern Med. 2004, 164: 2335-2342. 10.1001/archinte.164.21.2335.
3C Study Group: Vascular factors and risk of dementia: design of the Three-City Study and baseline characteristics of the study population. Neuroepidemiology. 2003, 22: 316-325.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly M, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575. 10.1086/519795.
Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O’Donnell CJ, de Bakker PI: SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics. 2008, 24: 2938-2939. 10.1093/bioinformatics/btn564.
Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22: 719-748.
Tregouet DA, Garelle V: A new JAVA interface implementation of THESIAS: testing haplotype effects in association studies. Bioinformatics. 2007, 23: 1038-1039. 10.1093/bioinformatics/btm058.
Gauderman WJ: Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol. 2002, 155: 478-484. 10.1093/aje/155.5.478.
Demidenko E: Sample size and optimal design for logistic regression with binary interaction. Stat Med. 2008, 27: 36-46. 10.1002/sim.2980.
Gyebesei A, Moody J, Semple CAM, Haley CS, Wei WH: High-throughput analysis of epistasis in genome-wide association studies with BiForce. Bioninformatics. 2012, 28: 1957-1964. 10.1093/bioinformatics/bts304.
Ueki M, Cordell HJ: Improved statistics for genome-wide interaction analysis. PLoS Genet. 2012, 8: e1002625-10.1371/journal.pgen.1002625.
Hsu L, Jiao S, Dai JY, Hutter C, Peter U, Kooperberg C: Powerful cocktail methods for detecting genome-wide gene-environment interaction. Genet Epidemiol. 2012, 36: 183-194. 10.1002/gepi.21610.
Van Steen K: Travelling the world of gene-gene interactions. Brief Bioinform. 2012, 13: 1-19. 10.1093/bib/bbr012.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/14/36/prepub
Statistical analyses benefit from the C2BIG computing centre funded by the Fondation pour la Recherche Médicale, La Région Ile de France (CODDIM) and the Genomic Network of the Pierre and Marie Curie University (Paris 06).
The authors declare they have no competing interests.
NG and DAT carried out statistical analyses. MG, JCL and WC were responsible for data collection and database management. AMD, DAT, MB, ML, PA and PEM contributed to the study design whose direct implementation was coordinated by DAT and PEM. All authors read and approved the final manuscript.