Genome-wide association and linkage analyses of hemostatic factors and hematological phenotypes in the Framingham Heart Study

Background Increased circulating levels of hemostatic factors as well as anemia have been associated with increased risk of cardiovascular disease (CVD). Known associations between hemostatic factors and sequence variants at genes encoding these factors explain only a small proportion of total phenotypic variation. We sought to confirm known putative loci and identify novel loci that may influence either trait in genome-wide association and linkage analyses using the Affymetrix GeneChip 100K single nucleotide polymorphism (SNP) set. Methods Plasma levels of circulating hemostatic factors (fibrinogen, factor VII, plasminogen activator inhibitor-1, von Willebrand factor, tissue plasminogen activator, D-dimer) and hematological phenotypes (platelet aggregation, viscosity, hemoglobin, red blood cell count, mean corpuscular volume, mean corpuscular hemoglobin concentration) were obtained in approximately 1000 Framingham Heart Study (FHS) participants from 310 families. Population-based association analyses using the generalized estimating equations (GEE), family-based association test (FBAT), and multipoint variance components linkage analyses were performed on the multivariable adjusted residuals of hemostatic and hematological phenotypes. Results In association analysis, the lowest GEE p-value for hemostatic factors was p = 4.5*10-16 for factor VII at SNP rs561241, a variant located near the F7 gene and in complete linkage disequilibrium (LD) (r2 = 1) with the Arg353Gln F7 SNP previously shown to account for 9% of total phenotypic variance. The lowest GEE p-value for hematological phenotypes was 7*10-8 at SNP rs2412522 on chromosome 4 for mean corpuscular hemoglobin concentration. We presented top 25 most significant GEE results with p-values in the range of 10-6 to 10-5 for hemostatic or hematological phenotypes. In relating 100K SNPs to known candidate genes, we identified two SNPs (rs1582055, rs4897475) in erythrocyte membrane protein band 4.1-like 2 (EPB41L2) associated with hematological phenotypes (GEE p < 10-3). In linkage analyses, the highest linkage LOD score for hemostatic factors was 3.3 for factor VII on chromosome 10 around 15 Mb, and for hematological phenotypes, LOD 3.4 for hemoglobin on chromosome 4 around 55 Mb. All GEE and FBAT association and variance components linkage results can be found at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007 Conclusion Using genome-wide association methodology, we have successfully identified a SNP in complete LD with a sequence variant previously shown to be strongly associated with factor VII, providing proof of principle for this approach. Further study of additional strongly associated SNPs and linked regions may identify novel variants that influence the inter-individual variability in hemostatic factors and hematological phenotypes.


Conclusion:
Using genome-wide association methodology, we have successfully identified a SNP in complete LD with a sequence variant previously shown to be strongly associated with factor VII, providing proof of principle for this approach. Further study of additional strongly associated SNPs and linked regions may identify novel variants that influence the inter-individual variability in hemostatic factors and hematological phenotypes.
Systematic searches for novel genes beyond the known genetic determinants influencing these phenotypes have been carried out using genome-wide linkage analyses with microsatellite markers: Chromosome regions that may harbor novel loci influencing fibrinogen, PAI-1 [21,22], hematocrit, Hgb, RBCC, MCV and MCH [23,24], have been identified. However, linkage scans with microsatellite markers generally had low power to detect loci with small effects, and lacked precision in localizing the loci; thus, few novel loci have been identified.
The recent completion of a genome-wide scan using the Affymetrix GeneChip Human Mapping 100K single nucleotide polymorphism (SNP) set on participants in the Framingham Heart Study offered the opportunity to conduct a genome-wide association study (GWAS) and linkage scan for variants that influence hemostatic factors and hematological phenotypes.

Study participants and genotyping methods
The Framingham Heart Study design and the genotyping of the Affymetrix GeneChip Human Mapping 100K SNP set on Framingham Heart Study participants are detailed in the overview of this project [25]. To avoid potential bias due to genotyping artifacts, we limited the association analyses to 70987 SNPs on autosomes with minor allele frequency (MAF) ≥ 10%, genotyping call rate ≥ 80%, and Hardy-Weinberg equilibrium test p-value ≥ 0.001.

Measurements of hemostatic factors and hematological phenotypes
Venous blood samples of Framingham Heart Study Offspring Cohort taken at the first and second examination cycles (1971-1975, and 1979-1983) were used to measure Hgb, RBCC, MCV and MCH, and samples taken at the fifth examination cycle (1991)(1992)(1993)(1994)(1995) were used to measure all the hemostatic factors, platelet aggregation, Ddimer, and viscosity. Fibrinogen was additionally measured at the sixth (1995-1998) and seventh (1998)(1999)(2000)(2001) examination cycles, and PAI-I antigen levels at the sixth exam. Details of the assessment of hemostatic factor levels have been described previously [17,26]. Plasma fibrinogen levels were measured using the Clauss method [27]. Plasma PAI-I antigen, tPA antigen, von Willebrand factor and FVII antigen were assessed using enzyme-linked immunosorbent assays.
The determination of hematological phenotypes has been detailed previously. Platelet aggregation was performed according to the method of Born [28]. The reagents used were epinephrine, ADP and collagen. The percent extent of aggregation in duplicate to epinephrine and ADP was determined in varying concentrations (0.01 to 15 mmol/ L). For each subject, the aggregation response (yes/no) was also tested to a fixed concentration of arachidonic acid (5 mg/mL). The collagen lag time was measured in response to 1.9 mmol/L collagen. Participants who were taking aspirin were excluded from the analyses for platelet aggregation phenotypes as well as PAI-1 and tPA.
HCT was measured by the Wintrobe method [29]. Blood was collected and spun at 5000 rpm for 20 minutes in a balanced oxalate tube. The percent of total blood volume that was due to red blood cells was determined visually against a calibrated scale. MCV is the average volume of an individual's red blood cells determined as the ratio of HCT to RBCC. MCH is the average amount of hemoglobin of an individual's red cell determined as the ratio of Hgb to RBCC.

Statistical methods
Standardized multivariable adjusted residuals of the hemostatic and hematological phenotypes were computed and used in all the linkage and association analyses. Covariates used in the adjustments were determined based upon what has been reported in the literature as potential risk factors for hemostatic factors or hematological phenotypes. Hardy-Weinberg equilibrium was examined using an exact chi-square test statistic [30]. Association between each SNP and each hemostatic or hematological phenotype was examined using a population based association method via generalized estimating equations (GEE) [31] and family-based association test (FBAT) [32], assuming an additive genetic model. Vari-ance components linkage analyses were conducted using a subset of SNPs with pairwise r 2 < 0.5. Details of both association and linkage methods are described in the overview of this project [25].
In secondary analyses, we combined the GEE association tests results across multiple phenotypes that may share the common pathway to reduce the type I error rates, and possibly detect SNPs of smaller effect sizes. We ranked SNPs by the number of GEE test p-values less than 0.01, and then by the geometric mean of the GEE test p-values. We also examined the β coefficient from the GEE regression that is the change in the phenotype in one standardized deviation unit with an increment of a copy of the alphabetically second allele (for example, allele G for a SNP with alleles A and G). This analysis was conducted for a phenotype assessed using multiple measurement methods such as the platelet aggregation with ADP-, collagen-, and Epi-induced platelet aggregation; or for a phenotype with serial measurements such as fibrinogen level measured at examination cycles 5, 6 and 7.
We attempted to identify association of 100K SNPs in or within 60 kilo base pairs (kbp) of selected candidate genes previously reported to be associated with hemostatic factors or hematological phenotypes. For hemostatic factors and platelet aggregation phenotypes, we included the fol-   Table 1 displays the hemostatic and hematological phenotypes analyzed in this study, as well as the number of individuals, examination cycles, and covariates used in multivariable models. The sample size ranged from 702 to 1073. Traits measured at multiple examinations were analyzed using multivariable adjusted residuals from each examination measure, and also the average of all the multivariable adjusted residuals from individual examination cycles.

Results
Among individuals who were included in the genotyping and had at least one hemostatic factor or platelet aggrega-tion phenotype measured at examination cycle five, 52% were women, mean age was 52 years, and 6% had prevalent CVD. Among individuals who were included in the genotyping and had at least one hematological phenotype measured at examination cycle one or two, 52% were women, with a mean age over the two examinations of 36 years, and 2% had prevalent CVD.

Association between SNPs and hemostatic and hematological phenotypes
We report the 25 SNPs with lowest GEE association test pvalues in Table 2 for hemostatic factors, and in Table 3 for hematological phenotypes. The lowest GEE p-value (4.5*10 -16 ) for hemostatic factors was obtained from the test of association between circulating levels of FVII and rs561241; this SNP resides near the F7 gene on chromosome 13 and is in complete linkage disequilibrium (LD) (r 2 = 1) with the Arg353Gln F7 SNP (rs6046) we previously reported to account for 9% of total phenotypic variance [16]. The lowest GEE p-value (6.9*10 -8 ) for hematological phenotypes was obtained in the test of association between MCH and rs1397048 on chromosome 11 near the olfactory receptors, olfactory receptor, family 5, subfamily AP, member 2 (OR5AP2), olfactory receptor, family 5, subfamily AR, member 1 (OR5AR1), olfactory receptor, family 9, subfamily G, member 1(OR9G1) and olfactory receptor, family 9, subfamily G,  Table A1 and Table A2, respectively.

Linkage results
Maximum multipoint LOD scores greater than 2 and the 1.5-LOD support intervals around the maximum LOD scores are presented in Table 4. The highest LOD score for hemostatic factors was 3.3 for factor VII at approximately 15 Mb on chromosome 10. The highest LOD for hematological phenotypes was 3.4 for Hgb at approximately 55 Mb on chromosome 4.

Combining association tests across multiple phenotypes
The top 10 SNPs with most number of p-values < 0.01 and lowest mean p-values are reported in Tables 5 and 6 for platelet aggregation phenotypes and fibrinogen levels respectively. The top ranked SNP for platelet aggregation was rs10500631 on chromosome 11 located near an olfactory gene cluster. The p-values of the GEE association test for ADP-, collagen-and epinephrine-induced platelet aggregation levels with this SNP were all less than 0.01, with average p-value 0.007 over the three tests. The range of the regression coefficients was 0.19-0.24, indicating the effect size was consistently estimated across the three phenotypes.
For fibrinogen, the top ranked SNP was rs4861952 on chromosome 4, which was also listed in the Table 2 as one of the 25 most significantly associated SNPs with hemostatic factors. This SNP was consistently associated with fibrinogen levels across three examination cycles with effect size ranging from -0.28 to -0.17.  HBD, HBG1, HBG2,  HBE1), and HEBP2 are presented in Table 8. The most significant associations were SNP rs1582055 near EPB41L2 * For Hgb, MCH and RBCC reported here, multivariable adjusted residuals from average of measurements over exam cycles 1 and 2 were used; for all platelet aggregation phenotypes, and viscosity reported here, multivariable adjusted residuals from measurements at examination cycle 5 were used. † Physical position is in base pair (bp) and based on the May 2004 human reference sequence (NCBI Build 35). † † P-value from GEE genotype association test and rank of the GEE p-values in ascending order. † † † P-value from family-based association test using the FBAT program. * Physical position is in base pair (bp) and based on the May 2004 human reference sequence (NCBI Build 35). † β coefficient is the change in the phenotype in one standardized deviation unit with an increment of a copy of the alphabetically higher allele. with hematocrit (p = 7.7 × 10 -5 ), Hgb (p = 2.9 × 10 -4 ), and RBCC (p = 3.9 × 10 -4 ); SNP rs4897475 with hematocrit (p = 1.6 × 10 -4 ) and Hgb (p = 6.0 × 10 -4 ).

Discussion
We conducted a GWAS and a genome-wide linkage analyses for hemostatic factors and hematological phenotypes measured in Framingham Heart Study Offspring participants. We identified a highly significant association between factor VII level and SNP rs561241 in complete LD with the F7 SNP rs6046 (Arg353Gln) previously demonstrated to explain about 9% of total phenotypic variation. This association is significant after Bonferroni correction for multiple testing (we used a conservative α = 5 × 10 -8 ), and confirms the strong association at this locus that has previously been reported by us and others. This SNP was also significant (p-value = 3.4 × 10 -4 ) at a nominal α level 0.05 for FBAT and linkage test (LOD = 1.8, pvalue = 0.002), but not after Bonferroni correction. That may be explained by the well known fact that FBAT and linkage test are less powerful than population-based association tests.
FBAT lacks power to detect variants that explain small proportion of variance for this study. It is difficult to distinguish true positives from false ones among FBAT results because it was evident that few 100K SNPs explain a large proportion of variance for hemostatic factors or hematological phenotypes. Given that there is no evidence for major population substructure in FHS [33] and there is greater power from use of GEE testing, we emphasize our population-based GEE analysis results in this report. Linkage analyses have the same problem of low power to detect small effects. However, a linkage peak can be caused by loci in linkage but not in LD with the SNPs, or by several loci of small effects in the region. Thus linkage peaks deserve additional attention. For example, we identified a linkage peak on chromosome 10 for multivariate adjusted factor VII. The SNP underneath the peak is rs2400107. However, the GEE association p-value was 0.52. This could occur because rs2400107 was linked but not in LD with the disease locus (loci) under the peak, or because this linkage peak was caused by several loci of small effects, or this peak was a false positive. Therefore, a more careful examination of the association results of SNPs under the linkage peak along with potentially additional genotyping may be needed to confirm the linkage results.
Among the SNPs with top GEE p-values in single phenotype or multiple phenotypes analyses, only a few resided near genes that were known for a likely role in hemostasis and thrombosis and hematological biology. For hemostatic factors, the cis-acting SNP rs561241 near F7 gene was associated with factor. For hematological phenotypes, we identified rs6811964 near PDGFC, platelet derived growth factor-C. It has been shown that PDGFC highly expressed in vascular smooth muscle cells, renal mesangial cells and platelets, and was likely involved in platelet biology [34]. This SNP was found associated with Epiinduced platelet aggregation (P = 10 -5 , Table 3), with ADP-induced platelet aggregation at nominal significance (P = 0.02), and with collagen-induced platelet aggregation at borderline nominal significance (P = 0.08). Other associations were found with SNPs in genes not clearly related to the phenotypes, or with SNPs that are not in known genes. These associations, together with other findings from this GWAS, must be viewed as hypotheses that warrant further testing in other cohorts.
Although we only summarized results for multivariable adjusted phenotype, we have also conducted linkage and association analyses for age-sex adjusted phenotypes. It is possible that the effects of some loci may be mediated through the covariates included in multivariable adjustment, and thus only associated with age and sex adjusted  phenotypes. Among the 52 SNPs that were associated with age and sex adjusted hemostatic factors or hematological phenotypes with a GEE p-value equal or less than 10 -5 , 28 SNPs had a GEE p-value greater than 10 -5 with multivariable adjusted phenotypes. However, no age and sex adjusted GEE p-value for the 28 SNPs reached genome-wide significance (p-value < 5 × 10 -8 ), and no new highly plausible candidate genes resided within 60 Kb of these SNPs. The full disclosure results of all analyses, including the age-sex adjusted analyses, can be viewed at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ study.cgi?id=phs000007.
There are some limitations to this study. The participants are Caucasian and thus the results may not be generalizable to other racial groups. The study sample size was relatively small, and as such, we may have insufficient power to detect small effects. To avoid worsening the multiple testing problem, we performed only sex-pooled and not sex-specific analyses. There may be some SNPs that are associated with some phenotypes only in female or male undetected in the current study. The advantages of this study are that we had family data, which enabled us to also apply family-based association tests that are robust to population admixture, and linkage analyses that can detect loci not in LD but in linkage with any 100K SNP. The study subjects were recruited without regarding to their phenotypic values, which makes the analyses of multiple phenotypes possible without the need to correct ascertainment bias.
Finally, compared with studies focused only on SNPs within candidate genes, GWAS approaches are unbiased and as such they have the advantage of detecting novel genes or confirming genes that are not well-known to have an influence on a phenotype. However, since the current GWAS uses only a subset of all the SNPs in HapMap [35], it may miss some genes due to lack of coverage. For the same reason, GWAS data usually are not enough to study a candidate gene comprehensively. To understand the roles played by each SNP in a candidate gene, additional genotyping, and single-SNP and haplotype analyses are needed. A large GWAS involving more than 550,000 SNPs in more than 9000 participants of FHS will be avail-

Conclusion
In summary, we have tested for association and linkage using the Affymetrix 100K SNPs and a set of hemostatic factor and hematological phenotypes. We have confirmed a previously reported association, providing proof of principle (a "positive control") for the GWAS approach. Our results provide a set of hypotheses that warrant testing in additional studies.  nucleotide polymorphism; tPA = tissue plasminogen activator; vWF = von Willebrand factor; WBC = white blood cell.