Genome-wide association with select biomarker traits in the Framingham Heart Study

Background Systemic biomarkers provide insights into disease pathogenesis, diagnosis, and risk stratification. Many systemic biomarker concentrations are heritable phenotypes. Genome-wide association studies (GWAS) provide mechanisms to investigate the genetic contributions to biomarker variability unconstrained by current knowledge of physiological relations. Methods We examined the association of Affymetrix 100K GeneChip single nucleotide polymorphisms (SNPs) to 22 systemic biomarker concentrations in 4 biological domains: inflammation/oxidative stress; natriuretic peptides; liver function; and vitamins. Related members of the Framingham Offspring cohort (n = 1012; mean age 59 ± 10 years, 51% women) had both phenotype and genotype data (minimum-maximum per phenotype n = 507–1008). We used Generalized Estimating Equations (GEE), Family Based Association Tests (FBAT) and variance components linkage to relate SNPs to multivariable-adjusted biomarker residuals. Autosomal SNPs (n = 70,987) meeting the following criteria were studied: minor allele frequency ≥ 10%, call rate ≥ 80% and Hardy-Weinberg equilibrium p ≥ 0.001. Results With GEE, 58 SNPs had p < 10-6: the top SNPs were rs2494250 (p = 1.00*10-14) and rs4128725 (p = 3.68*10-12) for monocyte chemoattractant protein-1 (MCP1), and rs2794520 (p = 2.83*10-8) and rs2808629 (p = 3.19*10-8) for C-reactive protein (CRP) averaged from 3 examinations (over about 20 years). With FBAT, 11 SNPs had p < 10-6: the top SNPs were the same for MCP1 (rs4128725, p = 3.28*10-8, and rs2494250, p = 3.55*10-8), and also included B-type natriuretic peptide (rs437021, p = 1.01*10-6) and Vitamin K percent undercarboxylated osteocalcin (rs2052028, p = 1.07*10-6). The peak LOD (logarithm of the odds) scores were for MCP1 (4.38, chromosome 1) and CRP (3.28, chromosome 1; previously described) concentrations; of note the 1.5 support interval included the MCP1 and CRP SNPs reported above (GEE model). Previous candidate SNP associations with circulating CRP concentrations were replicated at p < 0.05; the SNPs rs2794520 and rs2808629 are in linkage disequilibrium with previously reported SNPs. GEE, FBAT and linkage results are posted at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Conclusion The Framingham GWAS represents a resource to describe potentially novel genetic influences on systemic biomarker variability. The newly described associations will need to be replicated in other studies.


Conclusion:
The Framingham GWAS represents a resource to describe potentially novel genetic influences on systemic biomarker variability. The newly described associations will need to be replicated in other studies.

Background
There is intense clinical and research interest in blood and urinary biomarkers to diagnose disease, to risk stratify individuals for prognosis and potential intervention, and to provide insights into disease pathogenesis [1]. Hence, it has been proposed that biomarkers may prove useful in the goal of developing what has been referred to as "predictive, preemptive, personalized medicine" [2].
Because of their prognostic importance, there has been interest in understanding the environmental and genetic factors contributing to interindividual variability in systemic biomarker concentrations. Prior reports support the heritability of systemic biomarker concentrations reflecting inflammatory processes [14,15], natriuretic peptides activation [16], hepatic function [17,18], and vitamin metabolism [19]. The majority of prior studies examining the genetic contribution to biomarker concentrations have examined genetic linkage or variation in selected candidate genes. Although there have been some successes with both approaches [20], the specific genes contributing to variability of most circulating biomarkers are incompletely understood. We examined the relation of single nucleotide polymorphisms (SNPs) on the Affymetrix 100K chip to variation in systemic biomarker concentrations. The GWAS approach has the advantage that it is not constrained by known physiologic associations.

Study sample
The biomarkers were assessed in the Framingham Offspring sample, which is described in the Framingham 100K Overview [21]. Briefly, the Framingham Offspring were recruited in 1971-1974 from the children (and children's spouses) of the Framingham Original Cohort [22]. The examinations and the number of participants in which the biomarkers were assessed vary by analyte, as noted in Table 1.

Phenotype definitions and methods
Biomarkers were measured on morning specimens after an overnight fast (typically 10 hours) between 7:30 and 9:00 am. EDTA and citrated blood collection tubes are centrifuged in a refrigerated centrifuge immediately after venipuncture. Serum blood collection tubes sit for 30 minutes after venipuncture to allow for complete clotting. Specimens are processed immediately after centrifugation. Blood samples were centrifuged and frozen at -20°( examination 2 through 4) and -80° (examinations 5 through 7). The measurement of the inflammatory markers is detailed in the inflammatory marker manual at the National Center for Biotechnology Information http:// www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ study.cgi?id=phs000007.

Genotyping methods
Details of the genotyping methods are available in the Framingham Heart Study 100K Overview [21]. Framingham staff extracted genomic DNA with a Qiagen Blood and Cell Culture Maxi Kit from immortalized lymphoblasts. Briefly, SNPs on the Affymetrix 100K chip were genotyped (n = 112,990 autosomal SNPs) in a sample of family members of the Original and Offspring cohorts of the Framingham Heart Study [33]. SNPs were excluded for the following reasons: minor allele frequency <10% n = 38062; call rate <80% n = 2346; Hardy-Weinberg equilibrium p-value < 0.001 n = 1595, leaving 70,987 SNPs available for analysis.

Statistical analysis methods
We created standardized multivariable-adjusted natural log transformed biomarker residuals adjusted for the covariates listed in Table 1. The CRP average residuals were constructed as follows: (1) create age-and sex-adjusted or multivariable-adjusted residual at each of exams 2, 6 and 7; (2) take average of the residuals across exams; (3) the residual was excluded if there were not at least 2 exams for its calculation. In some instances we performed additional transformation (e.g. Winsorized models). Tobit models were used to generate residuals for the natriuretic peptides, because 2% of N-ANP levels and 30% of BNP levels were below the respective assay detection limits. Association and linkage results examining age-and sexadjusted residuals are posted at the web site. As described in the Overview [21], we examined generalized estimating equations (GEE) and family based association testing (FBAT), assuming an additive genetic effect, to account for correlation among related individuals within nuclear families. We also used Merlin software [34] (splitting the largest families) to compute exact identity by descent linkage, with variance component analysis in SOLAR using 11,200 SNPs and short tandem repeats [35]. Traits with extreme values, as defined by 4 standard deviations away from the mean, were Winsorized at 4.0 in secondary linkage analyses to determine the sensitivity of the logarithm of the odds (LOD) score to the presence of outlier values.

Results
Twenty-two biomarker traits (plus 4 additional CRP traits) were analyzed in 1012 Offspring participants, on log-transformed multivariable-adjusted residuals as outlined in Table 1 (minimum-maximum per phenotype n = 507-1008). The phenotypes were collected at various Framingham Offspring examinations from cycles 2 to 7. At examination cycles 2 and 7 the mean age of the participants with both phenotype and genotype data was 41 ± 10 and 59 ± 10 years, and 51.2% and 51.1% were women, respectively. For details of biomarker phenotype-genotype association refer to http://www.ncbi.nlm.nih.gov/ projects/gap/cgi-bin/study.cgi?id=phs000007.
We estimated the amount of variability in biomarker concentrations explained by the 4 most statistically significant SNPs in the GEE model using a pseudo measure of R 2 based on log-likelihood estimates [36]. The two most statistically significant GEE SNPs explained about 7% and 4% of the variability in MCP1 concentrations (R 2 = 0.070 for rs2494250 and R 2 = 0.043 for rs4128725); for CRP concentrations averaged over examinations 2, 6, and 7 the two most statistically significant GEE SNPs explained 2.3% of the variability [R 2 = 0.023 for rs2794520 and rs2808629) [36]. We also examined the linkage disequilibrium between the most statistically significant GEE SNPs: rs2494250 and rs4128725 had a D' = 0.724 and an r 2 = 0.196, whereas rs2794520 and rs2808629 served as perfect proxies for each other (D' = 1; r 2 = 1).
With FBAT, 11 SNPs were associated with biomarker concentrations with a p < 10 -6 . The two most statistically sig-
In Table 2c we list the magnitude and location of LOD scores > 2.5 observed for the circulating biomarker traits. Because we were concerned that some of the LOD scores might be inflated by individuals with extreme marker concentrations, we reanalyzed the LOD scores on Winsorized residuals. The peak Winsorized LOD scores observed were for the biomarkers MCP1 (4.38, chromosome 1), and CRP (3.23, chromosome 10; 3.28, chromosome 1). Of note the 1.5 LOD support intervals for the linkage peaks on chromosome 1 included the SNPs significantly associated with MCP1 and CRP reported above (GEE model).
In an effort to potentially uncover genetic pleiotropy we display in Table 3 two ways to synthesize findings across phenotypes. We examined 3 correlated inflammatory biomarker phenotypes, interleukin-6, CRP and fibrinogen, and report SNPs that were significantly associated with all 3 phenotypes by GEE or FBAT at p < 0.01 (Table  3a). We also examined phenotypes within a specific biomarker category including CRP over multiple examinations, liver function tests and vitamin concentrations (nutrients involved in bone health [37,38]), and display in Table 3b SNPs significant by either FBAT or GEE at a p < 0.01 for all of the phenotypes in a given phenotype cluster.

Discussion
In collaboration with NCBI we have web-posted our unfiltered biomarker-genotype associations and linkage results to provide a resource to investigators seeking to understand and replicate their biomarker-genotype associations. We submit that the findings of highest priority for follow-up are associations that were detected by several statistical approaches. MCP1 was associated with 2 SNPs on chromosome 1 (rs4128725 and rs2494250) with pvalues in the 10 -8 by FBAT, ≤ 10 -12 by GEE. Acknowledging that linkage is less powerful and accurate, we note that the 1.5 support interval for the MCP1 linkage peak (Winsorized maximum LOD 4.38) on chromosome 1 supports the GEE and FBAT analyses. Findings for CRP (chromosome 1), brain natriuretic peptide (chromosome 1) and Vitamin K % undercarboxylated osteocalcin (Chromosome 7) are also of potential priority for follow-up. We acknowledge that the ultimate validation of our findings will require replication in other cohorts and functional studies.
A fundamental challenge of GWAS tests is sorting through associations and prioritizing SNPs for follow-up. In the absence of external replication, one approach to synthesizing findings is to examine associations across similar biological domains, which may capture pleiotropy. We presented the exploratory analyses in Tables 3a and 3b, but reiterate that the findings will need to be examined in other cohorts.

Do the findings represent true positive genetic associations?
It is notable that some of the associations with the strongest statistical support were for associations between a gene and its protein product (e.g. CRP gene and CRP concentration). Cis-acting regulatory variants have been shown to influence mRNA and protein levels for many genes [65]. Studies involving additional biomarker phenotypes and variants (e.g. Affymetrix 500 K Chip) should clarify whether cis-or trans-acting regulatory variants explain the greatest proportion of phenotypic variation.
With GWAS, which typically test for the association of 1000s of SNPs with multiple traits, it is difficult for any specific association to achieve genome wide significance. For instance, a strict Bonferroni correction for the 30 traits tested in the present study with both age/sex-and multivariable-adjusted models and 2 statistical methods (0.05/ (70,987*30*2*2) would require a p = 5.9 × 10 -9 . We submit that the most significant association in the selected biomarker group, the FCER1A rs2494250 SNP with MCP1 concentrations achieved genome-wide significance with a GEE p = 1.0*10 -14 and a FBAT p = 3.5*10 -8 . It should be noted that rs2494250 and rs4128725 are in modest linkage disequilibrium (D' = 0.724 and r squared = 0.196) and hence, may be serving as proxies for the same causal SNP.
Several human and experimental studies suggest that the association between FCER1A and MCP1 concentrations is biologically plausible. FCER1A codes for the high affinity Fc receptor fragment for IgE. In vitro experiments with rat mast cells demonstrated that if aggregated the high affinity receptor for IgE (FcεRI) increased gene transcription and secretion of MCP1 [66]. Similarly, in mice mast cells if the FcεRI was occupied by small amounts IgE/antigen, MCP1 mRNA increased significantly [67]. In humans IgE and MCP1 concentrations are both increased in occupational asthma [68,69]. Similar to the animal data, human mast cells exposed to anti-IgE antibody or to IgE released MCP1 [70][71][72].

Comparison with prior literature
Our efforts to compare our findings with associations previously reported in the literature underscore some of the challenges in genetic association studies. The ICAM1 gene did not have any markers within 60 kb on the Affymetrix 100K chip. Of the 4 genes that did have SNPs in the marker genomic region coding, only the CRP association was replicated in our cohort; however as noted above we [32], as well as others [20], have previously reported this association. For bilirubin concentrations we previously reported significant linkage to chromosome 2q telomere [39] and a significant association to a TA repeat in UGT1A1, under this linkage peak [40] in Framingham unrelated participants. However, there was no association between bilirubin concentrations and the 3 SNP within 60 kb of UGT1A1. The previously reported interleukin-6-IL6 and the MCP1-CCL2 associations were not replicated. Of note, our group previously reported that rs1024611 [in CCL2] was associated with MCP1 concentrations in unrelated participants [63]; the association was nowhere close to significant in the present report (FBAT p = 0.78; GEE p = 0.35) Possible explanations of the failure to confirm the previously reported Framingham study MCP1-CCL2 association may stem from the current report having a smaller sample size (n = 989), using different genetic markers, and being conducted with an additive genetic model in related participants, as opposed to the prior study using unrelated participants (n = 1602) with recessive and dominant models. In a recent meta-analysis of phenotype-genotype association studies, only about one third (8 of 25) of the associations examined were replicated [73]. There are many plausible explanations why we did not replicate previously reported phenotype-genotype associations. Previous reports could represent false positive findings, or the present and prior study cohorts may differ on key fac-tors, which may modify the phenotype-genotype associations, or our lack of replication may represent a false negative report because of inadequate statistical power [73,74].

Strengths and limitations
The strengths of the present study include a comprehensively characterized community-based cohort, with biomarker phenotypes routinely assessed with careful attention to quality control. However, the cohort was largely middle-aged to elderly, and white of European descent, so the findings may not be generalizable to individuals who are younger or of other ethnicity/racial descent. DNA was collected at the 5 th and 6 th examinations, which may have introduced a survival bias. In addition, our study was susceptible to false negative findings because of the moderate size of the cohort; we lacked power to detect modest associations. Conversely, similar to most GWAS, the reported associations and linkage may represent false positive findings from multiple statistical testing.

Conclusions and future directions
The Framingham GWAS and the web posting of the unfiltered results represent a unique resource to discover potentially novel genetic influences on systemic biomarker variability. We acknowledge that the newly described associations will need to be replicated in other studies.

Competing interests
The authors declare that they have no competing interests.

Authors' contributions
EJB conceived of the FHS inflammation project, secured funding, planned the analyses, drafted and critically revised the manuscript. JD assisted in planning and conducting the analyses, and in writing and critically revising the manuscript. MGL planned the FHS inflammation project including assisting in securing funding, and planned and conducted analyses. KLL assisted in planning and conducting the analyses. SLB measured the vitamin data, assisted in planning the analyses and critically revising the manuscript. DRG participated in the study design and reviewed the manuscript. SK contributed to analyses of C-reactive protein and osteoprotegerin, and reviewed the manuscript. JFK assisted in securing the funding, supervised and organized the performance of the assays and reviewed the manuscript. MJK contributed to collecting the data base and revising the manuscript. JPL provided insights into the liver function test analyses and reviewed and approved the manuscript. JBM secured funding for and oversaw measurement of high-sensitivity TNFα concentrations and reviewed and approved the manuscript. SJR contributed to acquisition of the inflammation data, reviewing, revising and giving final approval to the manuscript. JR provided critical assistance in organizing the inflammatory marker data set, conducted quality control analyses and reviewed and gave final approval to the manuscript. RS was involved in revising the manuscript critically for important intellectual content and gave final approval of the version to be published. JAV assisted in securing funding for the inflammation project and revising the manuscript. TJW contributed to the analysis and interpretation of the data, and revision of the manuscript for important intellectual content. PWFW contributed to data acquisition, revision of the manuscript and final approval of the version submitted. PAW participated in 100K study design and reviewed and approved the manuscript. RSV provided critical input in conceiving the project, securing the funding, planning the analyses and critically revising the manuscript.