Framingham Heart Study genome-wide association: results for pulmonary function measures

Background Pulmonary function measures obtained by spirometry are used to diagnose chronic obstructive pulmonary disease (COPD) and are highly heritable. We conducted genome-wide association (GWA) analyses (Affymetrix 100K SNP GeneChip) for measures of lung function in the Framingham Heart Study. Methods Ten spirometry phenotypes including percent of predicted measures, mean spirometry measures over two examinations, and rates of change based on forced expiratory volume in one second (FEV1), forced vital capacity (FVC), forced expiratory flow from the 25th to 75th percentile (FEF25–75), the FEV1/FVC ratio, and the FEF25–75/FVC ratio were examined. Percent predicted phenotypes were created using each participant's latest exam with spirometry. Predicted lung function was estimated using models defined in the set of healthy never-smokers, and standardized residuals of percent predicted measures were created adjusting for smoking status, pack-years, and body mass index (BMI). All modeling was performed stratified by sex and cohort. Mean spirometry phenotypes were created using data from two examinations and adjusting for age, BMI, height, smoking and pack-years. Change in pulmonary function over time was studied using two to four examinations with spirometry to calculate slopes, which were then adjusted for age, height, smoking and pack-years. Results Analyses were restricted to 70,987 autosomal SNPs with minor allele frequency ≥ 10%, genotype call rate ≥ 80%, and Hardy-Weinberg equilibrium p-value ≥ 0.001. A SNP in the interleukin 6 receptor (IL6R) on chromosome 1 was among the best results for percent predicted FEF25–75. A non-synonymous coding SNP in glutathione S-transferase omega 2 (GSTO2) on chromosome 10 had top-ranked results studying the mean FEV1 and FVC measurements from two examinations. SNPs nearby the SOD3 and vitamin D binding protein genes, candidate genes for COPD, exhibited association to percent predicted phenotypes. Conclusion GSTO2 and IL6R are credible candidate genes for association to pulmonary function identified by GWA. These and other observed associations warrant replication studies. This resource of GWA results for pulmonary function measures is publicly available at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007.


Background
Chronic obstructive pulmonary disease (COPD), which affects approximately six percent of the adult US population [1] and is the fourth most common cause of death in the US [2], has been defined as "airflow limitation that is not fully reversible" [3], a definition based on spirometry. Environmental factors, most notably tobacco smoking, are associated with accelerated longitudinal decline of pulmonary function and are important causes of COPD. Several lines of evidence indicate that genetic factors also contribute to the development of this condition. First, family studies have revealed increased risk of lung function impairment in smoking first degree relatives of COPD cases [4] and substantial heritability of spirometry measures in population based studies [5]. Second, severe alpha-1-antitrypsin deficiency due to homozygous mutations of the SERPINA1 (AAT) gene is a documented cause of COPD, although this condition explains only a small proportion of COPD in the population. Finally, familybased and case-control studies are beginning to reveal genetic variants other than those of the SERPINA1 gene that are associated with chronic airflow obstruction [6]. Despite this evidence of a genetic basis for susceptibility to COPD, the specific genetic risk factors underlying most cases of COPD remain uncertain.
The Framingham Heart Study offers the opportunity to conduct family-based linkage and association studies seeking potential genetic factors that influence obstructive (COPD, asthma), restrictive, or developmentally related lung function impairment. Using spirometry measurements as quantitative phenotypes, we have previously reported results of genome-wide linkage analyses among these same families using microsatellite markers spaced approximately 10 centiMorgans apart [7] and fine-mapping of a promising region on chromosome 6q [8,9]. The availability of data on over 100,000 single nucleotide polymorphisms (SNPs) throughout the genome now permits the application of genome-wide association (GWA) testing to the search for genetic risk factors for chronic airflow limitation. The longitudinal pulmonary function data that have been obtained over the years of the Framingham Heart Study, in combination with the 100 K SNP data, make this a unique resource for the discovery of novel genetic risk factors for chronic airflow obstruction.

Methods
Three types of spirometry phenotypes were evaluated for GWA: 1) measurements taken at a participant's most recent available examination and expressed as a percent of predicted; 2) the mean of measurements taken at two specified examinations; and 3) the annual rate of decline of spirometry measurements derived by calculating the slope of measurements across multiple examinations. All spirometry was performed without bronchodilator testing.

Measurements as percent of predicted at latest examination
The spirometry measurements from each participant's latest examination with acceptable pulmonary function data [10] were used; eligible examinations included Cohort exams 19, 17, and 13 and Offspring exams 7, 6, 5, and 3. Predicted values for each lung function measurement were calculated using cohort and gender-specific regression models predicting spirometry measurements on the basis of age, age squared, and height squared [11] among Framingham subjects who were lifetime nonsmokers and had no history of chronic bronchitis, pulmonary disease, COPD/emphysema, asthma, or wheezing. The percent of predicted value was calculated by dividing the observed by the predicted value. Standardized residuals were then created by regressing the percent predicted on current smoking (y/n), former smoking (y/n), pack-years, and body mass index (BMI: kg/m 2 ), in cohort and gender-specific models. Forced expiratory volume in one second (FEV 1 ), forced vital capacity (FVC), forced expiratory flow between the 25 th and 75 th percentile (FEF  ), the FEV 1 / FVC ratio, and the FEF 25-75 /FVC ratio were examined as cross-sectional percent of predicted measures. These phenotypes are referenced in tables preceded by the letters "pp" (for percent predicted), see Table 1 for explanation of phenotype abbreviations.

Mean of measurements at two specified examinations
In previous analyses of the genetics of lung function in the Framingham families [7,8], we have used the mean of the values of each spirometry measure from two specified examinations. For Cohort participants, spirometry data from exam cycles 5 or 6 and cycle 13 were used to generate the mean value. In Offspring participants, spirometry data from cycle 3 and cycle 5 were used to generate the mean value. The mean FEV 1 , FVC, and FEV 1 /FVC ratio were adjusted for the effects of age, age 2 , BMI, height, dummy variables indicating never, former, or current smoking status, and, for former and current smokers, pack-years. Standardized residuals were generated separately by sex and within Cohort or Offspring samples. These phenotypes are referenced in tables as meanfev1, meanfvc, and meanratio.

Annual rate of decline of measurements
Rate of decline phenotypes were defined by fitting a slope to the spirometry data from different time points. The examinations incorporated were the same as those eligible for percent of predicted phenotypes described above. Slopes were calculated by ordinary least-squares using all available data. A minimum of two eligible exams were needed to calculate a slope. Slopes were adjusted for the covariates age, age squared, height, and height squared, using mean ages and heights from included exams. Slopes were also adjusted for pack-years at first exam, interim pack-years, and sustained smoking (y/n). FEV 1 and FEF  , the slope phenotypes with the highest heritability, were studied in the GWA. These phenotypes are referenced in tables as fev1slope and fefslope.

Statistical analysis methods
All SNPs were studied using family-based association tests (FBAT) and generalized estimating equations (GEE) [12] (see 100 K Overview). SNP results reported met the criteria of having a minor allele frequency ≥10%, a Hardy-Weinberg p-value ≥ 0.001, and a call rate ≥ 80%. All reported FBAT tests also required a minimum of ten informative families.
Multipoint variance component linkage analysis was implemented with a subset of 10,588 SNPs and all 612 available microsatellites studied in previous linkage analyses [12]. Multipoint identity-by-descent estimates were generated using the software Merlin [13]. Heritability estimates, estimating the proportion of the total phenotypic variance due to genetic effects, and variance component linkage analysis were performed using the software SOLAR [14].
In addition to evaluating all SNP associations with each phenotype individually, we developed a method to identify SNPs in or near genes that exhibited the strongest associations (as assessed by p-value) to multiple spirometry phenotypes. For each phenotype, we identified the 200 lowest p-values that met the criteria above and were localized within 60,000 base pairs of the transcription start or stop of a gene. All gene annotations are derived from the UCSC genome browser May 2004 assembly, build 125 http://genome.ucsc.edu/ [15,16]. We evaluated the frequency that a SNP appeared among the 200 lowest p-values in gene regions for the ten phenotypes. This strategy was based on the hypothesis that SNPs identified to be associated with multiple spirometry measures are more likely to reflect a true association with lung function than SNPs identified to be associated with only a single measurement. SNPs in gene regions that appeared among the lowest 200 p-values in five or more of the phenotypes studied are reported.

Candidate genes
Genes previously reported in the literature to be associated with spirometry measures or pulmonary disease were examined to determine whether any available 100 K SNPs in or near the genes were associated with spirometry phenotypes. Twelve COPD candidate genes studied in the Boston Early-Onset COPD cohort [17], and the SERPINE2 gene, a novel gene identified through linkage and association with COPD in the same cohort [6], were reviewed. The previously established COPD gene alpha-1-antitrypsin (SERPINA1) and the cystic fibrosis transmembrane conductance regulator (CFTR) as well as additional genes in the class of Glutathione S-transferases (O1, O2, M2, T1, T2) and surfactant proteins (SFTPA1, SFTPC) were reviewed. In addition, extracellular super oxide dismutase (SOD3) [18,19], interleukin-8 receptor alpha (IL8RA) [20], interleukin-10 (IL10) [21], beta-2 adrenergic receptor (ADRB2) [22], and transforming growth factor beta-1 (TGFB1) [23] were examined as COPD candidates. The GEE and FBAT SNP association results in or within 60 kilobase pairs (kb) of these 27 genes was reviewed. By specifying a 60 kb distance around the gene to screen results, we were able to identify SNPs near most candidate  Table 1 reports the heritability estimates for each of the ten phenotypes presented. The slope phenotypes have lower heritability than their corresponding cross sectional phenotypes. FVC has the highest heritability among phenotypes defined using the same method (percent predicted or mean). The percent predicted FEF 25-75 /FVC ratio had a higher heritability estimate than either FEV 1 /FVC ratio phenotype. Linkage to all autosomes and the X chromosome was performed. Table 2c reports all LOD scores above 2.0 with the 1.5-LOD support interval. The best LOD score observed is in a region of linkage on chromosome 6q that was reported previously using microsatellites in the Framingham families [7,8]. The original LOD score of 2.4 for mean FEV 1 using genome-wide microsatellites [7] was increased to a LOD of 2.89 with the addition of SNP data, and a LOD of 2.65 was observed in the same region for the percent predicted FEV 1 phenotype. The second highest LOD score observed was 2.86 for the longitudinal FEF  phenotype, which was located on the X chromosome. The percent predicted FEV 1 and FVC phenotypes both had LOD scores over 2.0 on chromosome 4, with overlapping LOD support intervals centered around 166-170 Mb.

Discussion
This is the first GWA of quantitative lung function measures to be reported, and it provides an opportunity for both hypothesis generation and hypothesis testing. We have identified a number of novel gene regions associated with pulmonary function. Associations with these SNPs  Table 2a and 2b a) SNP associated to meanfvc and meanfev1 in Table 2a under linkage peak for meanfev1 on chromosome 10. b) SNP associated to ppfef in Table 2a under linkage peak for meanratio on chromosome 1. c) SNP associated to meanfvc in Table 2b under linkage peak for ppfef on chromosome 5. d) SNP associated to ppfvc in Table 2b under linkage peak for meanfvc on chromosome 21. e) X chromosome linkage results are not available online and gene regions require replication in other study samples as well as functional studies before any statement about causality is warranted. Many of the best p-values are likely to reflect false positive results, and GEE results exhibited elevated Type I error [12] (see 100 K Overview). Additional studies of this data will be useful, including smoking stratified analyses and more sophisticated approaches to creating multivariate phenotypes. However, several of the observed associations involve genes for which there are plausible biologic rationales for a relation to lung function phenotypes.
The Glutathione S-Transferase (GST) superfamily genes are of interest because of their role in metabolism of xenobiotics, such as cigarette smoke. Recently, Hersh et al. studied GSTP1 and GSTM1 in two independent analyses of COPD and reported null findings. In contrast, a study of annual change in lung function measures in a population based cohort reported that the GSTT1 deletion alone or in combination with the GSTM1 deletion influenced decline in FEV 1 in men [24]. Using the Affymetrix 100 K SNP GeneChip, we have limited ability to directly confirm or refute the aforementioned findings because no SNPs were genotyped within 10 kb of the genes.
Here, we show that a non-synonymous SNP in exon 5 of GSTO2 encoding an Asn142Asp amino acid change is among the most striking GWA results for mean FEV 1 and FVC phenotypes. Both the non-synonymous SNP, rs156697, and a second SNP, rs156699, located in an intron exhibited strong association using both FBAT and GEE tests (r 2 between SNPs = 0.8 in HapMap CEU). Linkage results also support the evidence for GSTO2, as the gene's position lies within the confidence interval around the LOD of 2.12 observed on chromosome 10 for mean FEV 1 (Table 2c). GSTO2 is involved in the biotransforma-tion of arsenic, which is a component of cigarette smoke, and may exhibit modest expression in bronchial epithelial cells. Gene expression studies in COS-1 cells demonstrated that the Asp142 variant exhibited 76% of the level of expressed protein occurring in the wild-type, and expression levels were further reduced to 15% when the Asp142 occurred in conjunction with an Ile158 variant [25]. The observation of strong association to a non-synonymous polymorphism with demonstrated effects on gene expression is compelling. Moreover, this finding in conjunction with a growing literature on GST gene association with pulmonary phenotypes suggests that a complete evaluation of functional variants in this gene family may be warranted.
The IL6R SNP was not only among the top 25 p-values for percent predicted FEF  , but also among the top 200 pvalues in gene regions for six of the ten phenotypes presented. IL6R is thought to be expressed in lung, and may play a role in the immune response. Recently, we have shown that IL6 levels in blood were associated with impaired lung function in the Framingham offspring cohort [26]. The IL6 pathway, as a mediator of the inflammatory process, is of interest as it relates to lung function phenotypes.
The SNP identified in the SOD3 region lies within a hypothetical protein 3' of the SOD3 gene. The non-synonymous SNP in SOD3 that has been reported for association with COPD (rs1799895) was not included in the Hap-Map, so we could not determine the extent of LD between it and the SNPs genotyped in this study. Another exon 3 SNP (rs2536512) located 519 base pairs away from rs1799895 is present in the HapMap. The associated SNP identified in this study, rs10489030, is in very low LD with the SOD3 exon 3 HapMap SNP (D' = 0.32 and r 2 = The SNP identified in the region of SERPINE2 (rs717610) was not in LD with the six reported SNPs replicating significant associations to COPD [6] that were also available in HapMap, as the r 2 values ranged from 0.002 to 0.008. Two of the top SNPs (rs3820928, GEE rank #7; rs10498137, FBAT rank #9) lie within the linkage region identified for FEV 1 /FVC in severe early-onset COPD cases [27] that subsequently led to the discovery of the SERPINE2 associated SNP. SNP rs3820928 was among those identified with association to five of the phenotypes studied and lies in the gene RHBDD1 (alias DKFZp547E052). Not much is known about the function of this gene, but the associated SNP is located in a region with LD extending to the adjacent COL4A4 gene. The rs3820928 exhibited an r 2 of 0.81 with two non-synonymous coding SNPs in COL4A4 in the HapMap CEU data. Defects in Type IV collagen genes have been shown to influence Goodpasture's syndrome, an autoimmune disease affecting the lung [28], and both COL4A4 and COL4A3 lie in this region, sharing a common promoter. These results do not provide a strong replication of the original SERPINE2 SNP associations due to the low LD between SNPs reported in the literature and the SNPs from the 100 K data with association. However, the results suggest that chromosome 2q is of continued interest and may harbor multiple genes influencing lung function.
Studying lung function measures in a community-based sample may identify genetic variants associated with lung growth and development, susceptibility to obstructive ventilatory impairment related to asthma, emphysema and COPD or susceptibility to restrictive ventilatory impairment due to pulmonary fibrosis or other processes. The relevance to disease pathogenesis of associations between SNPs and lung function must be interpreted with caution, and some of the observed associations may reflect polymorphisms that protect against ventilatory impairment by leading to better lung function in early life or protection against the adverse effects of cigarette smoking. All of the GWA results are publicly available at http:/ /www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ study.cgi?id=phs000007. Replication of novel results identified by GWA will be the true test of the value of the GWA approach to gene discovery.

Conclusion
These publicly available results provide a resource for investigators to assess whether their findings of association to pulmonary function phenotypes replicate in the Framingham population. In addition, we have identified novel results that warrant replication studies in other populations.