Genome-wide association study of lung function and clinical implication in heavy smokers

Background The aim of this study is to identify genetic loci associated with post-bronchodilator FEV1/FVC and FEV1, and develop a multi-gene predictive model for lung function in COPD. Methods Genome-wide association study (GWAS) of post-bronchodilator FEV1/FVC and FEV1 was performed in 1645 non-Hispanic White European descent smokers. Results A functional rare variant in SERPINA1 (rs28929474: Glu342Lys) was significantly associated with post-bronchodilator FEV1/FVC (p = 1.2 × 10− 8) and FEV1 (p = 2.1 × 10− 9). In addition, this variant was associated with COPD (OR = 2.3; p = 7.8 × 10− 4) and severity (OR = 4.1; p = 0.0036). Heterozygous subjects (CT genotype) had significantly lower lung function and higher percentage of COPD and more severe COPD than subjects with the CC genotype. 8.6% of the variance of post-bronchodilator FEV1/FVC can be explained by SNPs in 10 genes with age, sex, and pack-years of cigarette smoking (P <  2.2 × 10− 16). Conclusions This study is the first to show genome-wide significant association of rs28929474 in SERPINA1 with lung function. Of clinical importance, heterozygotes of rs28929474 (4.7% of subjects) have significantly reduced pulmonary function, demonstrating a major impact in smokers. The multi-gene model is significantly associated with CT-based emphysema and clinical outcome measures of severity. Combining genetic information with demographic and environmental factors will further increase the predictive power for assessing reduced lung function and COPD severity. Electronic supplementary material The online version of this article (10.1186/s12881-018-0656-z) contains supplementary material, which is available to authorized users.

Twenty-eight genomic loci associated with baseline FEV 1 /FVC or FEV 1 have been identified by meta-analyses of genome-wide association studies (GWAS) in general populations of European descent [2][3][4]. A recent GWAS comparing extremes of high and low baseline FEV 1 in subjects of European ancestry from the UK Biobank has identified five loci (KANSL1, HLA-DQ, NPNT, TET2, and TSEN54) in never smokers and RBM19-TBX5 in heavy smokers [5]. HHIP, FAM13A1, CHRNA3, RIN3, MMP12, and TGFB2 have been associated with COPD at genome-wide significant levels [6]. To our knowledge, no GWAS study has been performed on post-bronchodilator FEV 1 /FVC and FEV 1 in smokers, which defines a diagnosis of COPD and determines COPD severity, respectively.
GWAS of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 were performed in non-Hispanic White smokers (n = 1645, GOLD stage 0-4, smoking≥20 packs/ year) from the NHLBI-sponsored SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS). In addition to evaluating previously reported loci associated with baseline lung function in general populations, we aimed to identify novel genes associated with abnormal post-bronchodilator lung function in smokers enriched for COPD and develop a model to predict lung function using multiple genes and demographic/environmental factors.

Study subjects
SPIROMICS is a prospective cohort study that enrolled 2981 participants with the goals of identifying new COPD subgroups and intermediate markers of disease progression [7,8]. SPIROMICS is a well-characterized longitudinal cohort with comprehensive phenotyping including measurements of lung function and quantitative CT scan. Spirometry was performed before and after four inhalations with 90 μg albuterol and 18 μg ipratropium per inhalation according to ATS recommendations. Non-Hispanic White smokers (ever or current smoking≥20 packs/year) with genotyping information available were included in this analysis. Smokers with COPD were defined as smokers (smoking≥20 packs/ year) with post-bronchodilator FEV 1 /FVC < 0.7 (GOLD stage [1][2][3][4] and 'healthy' smoking controls were defined as smokers (smoking≥20 packs/year) with post-bronchodilator FEV 1 /FVC ≥ 0.7 (GOLD stage 0). DNA was isolated using standard protocols, and SNP genotyping performed using Illumina HumanOmniExpressExome BeadChip and BeadStudio (Illumina, Inc., San Diego, CA).
Participants were recruited at each center through physician referral, advertisement in clinical areas or self-referral using the SPIROMICS study website (www.spiromics.com). The research protocol was approved by the institutional review boards of all participating institutions with written informed consent from all participants.

Statistical analysis
For quality control, subjects were removed if they 1) had genotyping call rates< 95%, 2) were discrepant for genetic sex, 3) failed the check for family relatedness, or 4) were detected as an outlier. After subjects meeting these criteria were excluded, SNPs were removed if 1) call rates< 95%, 2) inconsistent with Hardy-Weinberg Equilibrium (HWE) (p < 10 − 6 ), or 3) minor allele frequency (MAF) < 0.01. A linear additive model was used for analysis of pre −/post-bronchodilator FEV 1 /FVC, percent predicted FEV 1 , FVC, and % change in FEV 1 bronchodilator response using PLINK software (URL: zzz.bwh.harvard.edu/plink/) [9], adjusted for age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components from the multidimensional scaling analysis of genotypes on the chip. Association analyses of Pre-/Post-bronchodilator FEV 1 and FVC in ml were performed using linear regression adjusted for sex, age, age 2 , height, height 2 , weight, current smoking status, pack-years of cigarette smoking, and the first two principal components. Association analyses of COPD and COPD severity were performed using logistic regression adjusted for age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components. P values≤5 × 10 − 8 were considered genome-wide significant. P values ≤0.05 were considered significant for SNP-level evaluation of previously reported candidate SNPs associated with baseline lung function. SNAP software (URL: http://www.broad.mit.edu/mpg/snap/) was used to generate the association plots [10]. Joint analysis of 10 confirmed candidate SNPs was performed, in which eight subjects with homozygous TT genotype of rs28929474 in SERPINA1 (PiZZ genotype) were not included in joint analysis to avoid bias. Genetic scores were defined by the number of risk alleles presented in these 10 SNPs. A linear model was used for analysis of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 with genetic scores in 1632 current or former smokers. Joint analysis of these 10 candidate SNPs was also performed for post-bronchodilator percent predicted FEV 1 and percentage of subjects with severe COPD (GOLD stage 3-4) in 1077 smokers with COPD.

GWAS of post-bronchodilator pulmonary function
After quality control analysis, 1645 non-Hispanic White subjects (1086 subjects with COPD and 559 current and former smokers with preserved lung function [8]) were included in the analysis ( Table 1). GWAS of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 were performed for 635,970 single nucleotide polymorphisms (SNPs) with MAF ≥ 0.01 in 1645 non-Hispanic White smokers with age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components as covariates in the linear additive model. Genomic inflation factors are 1.013 and 1.017 for GWAS of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 , respectively, indicating limited genomic inflation.
Association of SERPINA1 with lung function, COPD, and COPD severity Pre−/post-bronchodilator lung function was stratified by genotypes of rs28929474 (Table 3). rs28929474 was associated in a stepwise fashion with pre−/post-bronchodilator FEV 1 /FVC ratio (0.39, 0.54, and 0.61 for genotype TT, CT, and CC, respectively; p = 1.2 × 10 − 8 ). rs28929474 was   Association analyses of Post-bronchodilator % predicted FEV 1 and FEV 1 /FVC were performed using linear regression adjusted for age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components associated with pre−/post-bronchodilator FEV 1 (33.5, 61.3, and 72.5 or 1210, 1841, and 2115 ml for genotype TT, CT, and CC, respectively; p = 2.1 × 10 − 9 ). Pre−/postbronchodilator lung function was significantly different between CT and CC or TT and CC genotype groups, however differences between TT and CT genotype groups were not as marked. rs28929474 was associated with post-bronchodilator FEV 1 /FVC (β = − 0.060, p = 1.1 × 10 − 5 ) and percent predicted FEV 1 (β = − 8.73; p = 2.6 × 10 − 4 ) in subjects with COPD (GOLD stage 1-4), but not in subjects without COPD (GOLD stage 0; data not shown). Thus, the association of rs28929474 with lung function was driven by subjects with COPD. Additional COPD-related phenotypes were analyzed for association with rs28929474 (Table 3). rs28929474 was also associated with COPD status (odds ratio = 2.3, p = 7.8 × 10 − 4 ) and COPD severity (odds ratio = 4.1, p = 0.0036) ( Table 3). The percentage of subjects with COPD or severe COPD was significantly higher in subjects with CT genotype than CC genotype. rs28929474 was a less common SNP with minor allele frequency (MAF) of 0.029 in SPIROMICS (Additional file 1: Table S3). Homozygous risk genotype TT was present only in subjects (n = 8) with severe COPD (GOLD stage 3-4).

Prediction of post-bronchodilator pulmonary function
Joint analysis of the most consistently associated 10 SNPs, based on our analyses and previous findings was performed. Genetic scores (the number of risk alleles) and pack-years of cigarette smoking were significantly associated with post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 (Table 4). Age at enrollment and sex were significantly associated with post-bronchodilator FEV 1 / FVC but not associated with percent predicted FEV 1 . In 1632 SPIROMICS non-Hispanic White smokers (GOLD stage 0-4), genetic score, age, sex, and pack-years of cigarette smoking explained 3.6, 1.5, 1.9, 3.0%, and together 8.6% of the variance of post-bronchodilator FEV 1 / FVC (Table 4). Genetic score and pack-years of cigarette smoking explained 3.0, 2.9%, and together 5.8% of the variance of post-bronchodilator percent predicted FEV 1 ( Table 4). In 1077 SPIROMICS non-Hispanic White smokers with COPD (GOLD stage 1-4), post-bronchodilator FEV 1 decreased significantly with the increase in the number of risk alleles, from 65.4 to 54.0 (p = 1.2 × 10 − 5 ) and the percentage of subjects with severe COPD (GOLD stage 3-4) increased significantly from 25.6 to 48.3% (p = 5.5 × 10 − 5 ) (Fig. 3).

Discussion
In this study, we performed GWAS of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 , and identified rs28929474 in SERPINA1. In 1963, Laurell and Eriksson identified the connection between alpha 1-antitrypsin (A1AT) deficiency and degenerative pulmonary disease [11]. The SERPINA1 gene on chr14q32 encodes A1AT protein. The most common variant of SERPINA1 causing A1AT deficiency is the Z allele (rs28929474: Glu342Lys), which is a missense mutation of glutamic acid to lysine at position 342 of A1AT protein. The homozygous TT  Association analyses of age or sex were performed using linear or logistic regression without adjustment. Association analyses of Pre-/Post-bronchodilator FEV 1 and FVC in ml were performed using linear regression adjusted for sex, age, age 2 , height, height 2 , weight, current smoking status, pack-years of cigarette smoking, and the first two principal components. Association analyses of Pre-/Post-bronchodilator FEV 1 , FVC, and FEV 1 /FVC, and % change in FEV 1 were performed using linear regression adjusted for age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components. Association analyses of COPD and COPD severity were performed using logistic regression adjusted for age, sex, current smoking status, pack-years of cigarette smoking, and the first two principal components Genetic scores (the number of risk alleles) of 10 candidate SNPs (rs28929474 in SERPINA1, rs1980057 in HHIP, rs2869967 in FAM13A1, rs2070600 in AGER, rs1435867 in PID1, rs12477314 in HDAC4, rs1529672 in RARB, rs12914385 in CHRNA3, rs10498635 in RIN3, and rs615098 in MMP12). 1632 SPIROMICS non-Hispanic White smokers (GOLD stage 0-4) were included. Eight subjects with TT genotype of rs28929474 in SERPINA1 (PiZZ genotype) were excluded genotype of rs28929474 (PiZZ genotype) is consistently associated with emphysema, decreased lung function, and COPD [12,13]. Previous GWAS of COPD, emphysema, and lung function did not identify rs28929474 in SERPINA1 [2][3][4][5][6]14]. There are several potential reasons for missing this association. rs28929474 is relatively rare in the general population, for example, approximately 2 and 0.01% of the population in the United States are heterozygous or homozygous for the T allele, respectively [15]. The largest meta-analyses of GWAS of baseline lung function in general populations of European descent [2][3][4][5] have included tens of thousands subjects, however very few subjects may have been homozygous for the T allele and more importantly these studies did not ascertain subjects with a significant history of cigarette smoking, a necessary environmental exposure. Thus, these studies in general populations have limited power to identify the association between rs28929474 and lung function. In this study, we performed GWAS of post-bronchodilator lung function in heavy smokers enriched for COPD. As expected the number of subjects with homozygous TT genotype was rare (n = 8 in 1645 or 0.49%) but the heterozygous CT genotype was more common (n = 78 or 4.74%). In addition, rs28929474 is not included in the previously designed GWAS chips nor are there other SNPs in strong LD (r 2 > 0.5) with rs28929474, preventing the identification of association with COPD and emphysema [6,14]. The Illumina OmniExpressExome BeadChip used in this study includes exonic markers  identified from exome and whole genome sequencing projects. rs28919474 (exm1124179) was directly genotyped. This study found rs28929474 in SERPINA1 to be associated with pre-and post-bronchodilator FEV 1 /FVC and FEV 1 at a genome-wide significant level (Table 3).
Although the function of homozygous TT has been known for a long while, the effect of heterozygous CT is more controversial and has been questioned in candidate-gene studies in the past [16][17][18]. For example, in a general population (n = 4600), baseline FEV 1 /FVC and FEV 1 were not significantly different between PiMM and PiMZ [17]. In a case-control study (834 COPD cases and 835 controls), post-bronchodilator FEV 1 /FVC and FEV 1 were not significantly different between PiMM and PiMZ [16]. In a small study composed of mainly healthy subjects, post-bronchodilator FEV 1 /FVC (0.77 or 0.71 for PiMM or PiMZ) and percent predicted FEV 1 (96.4 or 84.6 for PiMM or PiMZ) were significantly different in ever-smokers but not in never-smokers [18]. In a recent candidate-gene study (5518 non-Hispanic Whites and 2753 African Americans with ≥10 pack-years of smoking), subjects with PiMZ had significant lower lung function than subjects with PiMM in both Whites and African Americans [19]. In the current study, subjects with CT genotype had intermediate values for lung function between subjects with TT and CC genotype ( Table 3). Subjects with CT genotypes had significantly lower post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 , and higher percentage of COPD and more severe COPD than subjects with CC genotype. Thus, SERPINA1 CT heterozygosity has important functional effects on COPD and lung function. All subjects included in our study had a history of tobacco smoking with at least a 20-pack-years. Association results were unaffected by the number of pack-years of cigarette smoking in our study. Compared with results from COPDGene study [19], this study included heavier smokers, and thus had lower lung function. More importantly, this study is a hypothesis-free GWAS study, which identified association of rs28929474 with lung function at genome-wide significant level for the first time. More than a hundred common and rare variants exist in the SERPINA1 gene. Thun et al. have identified synthetic association between common variants in SER-PINA1 and serum A1AT levels, suggesting A1AT levels are causally determined by rare variants such as Z allele and S allele (rs17580) [20]. Cho et al. have identified rs45505795 in SERPINA10 with MAF of 0.04 (not in strong LD with rs28929474: r 2 = 0.295) associated with emphysema [14]. We found no SNP other than rs28929474 in SERPINA1 region to be strongly associated with lung function (Figs. 1 and 2).
To develop a multi-gene predictive model for lung function, genes associated with lung function and COPD in previous published studies were evaluated. We identified the association of HHIP, FAM13A1, AGER, PID1, HDAC4, RARB, CHRNA3, RIN3, and MMP12 with post-bronchodilator lung function at the SNP level ( Table 2). In a previous study, we have showed that HHIP, FAM13A1, AGER and RARB associated with pre-bronchodilator lung function in subjects with asthma [21]. The lung expression quantitative trait locus (eQTL) analysis has identified cis-eQTL SNPs in HHIP, FAM13A1, and AGER [22]. All the evidence indicates rs1980057 in HHIP, rs2869967 in FAM13A1, and rs2070600 in AGER are functionally relevant SNPs important for lung function in the general population and in subjects with COPD or asthma. rs4537555 in HHAT was strongly associated with post-bronchodilator FEV 1 / FVC ( Table 2). HHAT is a hedgehog acyltransferase which catalyzes N-terminal palmitoylation of sonic hedgehog (SHH). Hedgehog interacting protein (HHIP) and patched homolog 1 (PTCH1) are the other two genes involved in hedgehog signaling pathway and associated with lung function [2][3][4]21], indicating the importance of this pathway in lung development and function. Independent replication and functional study of HHAT are warranted.
Since each of these variants alone had smaller effects, we performed a joint analysis of 10 confirmed candidate SNPs. This analysis explained 3.63 and 2.96% variance of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 , respectively (Table 4). In contrast, pack-years of cigarette smoking explained 3.04 and 2.94% variance of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 . A genetic score using these 10 candidate SNPs, age, sex, and pack-years of cigarette smoking together explained 8.59 and 5.83% variance of post-bronchodilator FEV 1 /FVC and percent predicted FEV 1 . In addition, joint analysis of 10 confirmed candidate SNPs (with Z allele homozygotes removed) was performed on CT evidence of emphysema and airtrapping, BODE index, COPD, CAT score, SGRQ total score, 6MWD, and exacerbations (Table 5) in all heavy smokers (Gold stage 0-4). Statistical and clinical significant difference was shown between two extreme genetic score groups (8-11 vs. 16-18) for emphysema, airtrapping, BODE index, SGRQ total score, 6MWD, and exacerbations, indicating the potential usefulness of genetic information to distinguish clinical subgroups of heavy smokers. It will be important to evaluate the power of this model to predict decline in lung function and progression of COPD severity longitudinally in clinical settings.
In summary, rs28929474 in SERPINA1 is clearly associated with post-bronchodilator FEV 1 /FVC and FEV 1 among heavy smokers. This study is the first to show genome-wide significant association of rs28929474 with lung function. In addition, rs28929474 is associated with COPD and COPD severity. While well-established rare ZZ homozygotes have severe COPD and emphysema, this study establishes that more common heterozygotes (4.7% of subjects) at this locus lead to pulmonary abnormality in smokers and COPD. Thus, in future clinical studies, this largely ignored heterozygotes group should be carefully examined. A joint genetic model combined with environmental factors is associated with reduced lung function, emphysema, exacerbation, and clinical symptoms. The models should be tested in other populations as well as longitudinally to evaluate potential value of predicting COPD progression and severity.

Additional file
Additional file 1: Table S1. Association Results of the Top SNPs (P < 10 − 4 ) with Post-bronchodilator FEV 1 /FVC. Table S2. Association Results of the Top SNPs (P < 10 − 4 ) with Post-bronchodilator % Predicted FEV 1 . Table S3. Genotype Frequency of rs28929474 in SERPINA1 Stratified by GOLD Stages. Table S4. Prediction Models for Post-bronchodilator Lung Function Using Top 10 SNPs for Post-bronchodilator % Predicted FEV 1 . Figure S1  The funders had no role in the study design, data collection, data analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information files. Raw genotype data may be accessible by contacting SPIROMICS (https://www.spiromics.org). Ethics approval and consent to participate Participants were recruited at each center through physician referral, advertisement in clinical areas or self-referral using the SPIROMICS study website (www.spiromics.com). The research protocol was approved by the institutional review boards of all participating institutions (Wake Forest School of Medicine, Columbia University, University of California at San Francisco, University of California at Los Angeles, University of North Carolina at Chapel Hill, University of Alabama at Birmingham, University of Michigan, Johns Hopkins University School of Medicine, University of Iowa, University of Utah, Weill Cornell Medical College of Cornell University) with written informed consent from all participants.

Consent for publication
Not applicable.