Defining the contribution of SNPs identified in asthma GWAS to clinical variables in asthmatic children

Background Asthma genome-wide association studies (GWAS) have identified several asthma susceptibility genes with confidence; however the relative contribution of these genetic variants or single nucleotide polymorphisms (SNPs) to clinical endpoints (as opposed to disease diagnosis) remains largely unknown. Thus the aim of this study was to firstly bridge this gap in knowledge and secondly investigate whether these SNPs or those that are in linkage disequilibrium are likely to be functional candidates with respect to regulation of gene expression, using reported data from the ENCODE project. Methods Eleven of the key SNPs identified in eight loci from recent asthma GWAS were evaluated for association with asthma and clinical outcomes, including percent predicted FEV1, bronchial hyperresponsiveness (BHR) to methacholine, severity defined by British Thoracic Society steps and positive response to skin prick test, using the family based association test additive model in a well characterised UK cohort consisting of 370 families with at least two asthmatic children. Results GSDMB SNP rs2305480 (Ser311Pro) was associated with asthma diagnosis (p = 8.9×10-4), BHR (p = 8.2×10-4) and severity (p = 1.5×10-4) with supporting evidence from a second GSDMB SNP rs11078927 (intronic). SNPs evaluated in IL33, IL18R1, IL1RL1, SMAD3, IL2RB, PDE4D, CRB1 and RAD50 did not show association with any phenotype tested when corrected for multiple testing. Analysis using ENCODE data provides further insight into the functional relevance of these SNPs. Conclusions Our results provide further support for the role of GSDMB SNPs in determining multiple asthma related phenotypes in childhood asthma including associations with lung function and disease severity.


Background
Asthma is a complex respiratory disease with susceptibility involving the interplay between genetic and environmental factors [1]. The chronic inflammation of the airways is characterised by reversible airflow obstruction and increased hyperresponsiveness of the airways. Genome-wide association studies (GWAS) are currently the preferred method for the identification of genetic determinants in complex multifactorial diseases such as asthma. These studies have the advantage of being able to identify polymorphisms with small effect sizes and localise smaller susceptibility regions due to linkage disequilibrium (LD). The first GWAS for asthma was completed involving 994 childhood-onset asthma patients and 1,243 non-asthma controls and identified single nucleotide polymorphisms (SNPs) within chromosome 17q21 spanning the ORMDL3/ GSDMB genes [2]. These SNPs showed highly significant association with childhood asthma and were also replicated in two independent cohorts [2]. A study in an Icelandic cohort showed significant association with asthma and a SNP in IL1RL1 [3] and SNPs in PDE4D have also shown association with genome-wide significance with asthma [4]. IL-13 and RAD50 on 5q31.1 and the HLA-DR/ DQ locus at 6p21.3 have been implicated by Li et al. [5].
Other polymorphisms spanning IL33, IL18R1, SMAD3, IL2RB and CRB1 have been proposed by other GWAS using large scale discovery and replication approaches [6,7]. However, due to the phenotype used for case definition, which is often a physician diagnosis or self-report of asthma, there remains a need to determine the contribution of these polymorphisms to clinically relevant endpoints in asthma. This will provide a greater insight into the mechanisms altered in individuals carrying the relevant risk alleles.
One problem that arises from GWAS is that LD patterns can result in multiple polymorphisms at a locus being associated with disease phenotype, even if only one of them is causal. A problem therefore exists in the identification of potential causal variants. This can be simply analysed for polymorphisms within coding regions, as they could potentially affect the polypeptide structure and function and those SNPs in regulatory or intronic regions could affect splice site motifs, transcriptional regulation or RNA stability [8,9]. However, many GWAS have implicated polymorphisms in non transcribed regions far away from annotated genes, where their role is likely to be regulatory [10]. The ENCODE project was established to identify all the functional elements such as protein binding sites, transcripts (coding and non coding) and chromatin marks in the human genome. This represents a valuable resource to provide additional meaning to association data provided from GWAS [11,12]. Taken in consideration with LD data from the 1000 Genomes project [13], computational analysis will enable assignment of more meaningful potential function to SNPs identified from GWAS or those SNPs which are in LD with a GWAS associated SNP [10], to help better prioritise SNPs for functional analysis.
The aim of the present study was to complete a SNP for SNP analysis of the key variants identified in the Caucasian population meeting genome-wide significance and/or replication identified 2007-2011 on clinical variables underlying asthma (Table 1). As these SNPs have previously shown genome-wide association, we also sought to investigate the potential functional significance of these eleven SNPs and those in LD (with these SNPs) by mining the ENCODE dataset [14] and also investigating whether they show association with expression quantitative trait loci (eQTL).

Asthma cohort characteristics
The asthma family cohort was made up of United Kingdom families from the Southampton and Nottingham areas (n = 370 families were used in this study) ( Table 2). From the Southampton area, 341 Caucasian families (n = 1508) with at least two biological siblings with physician diagnosed asthma were recruited. This cohort has been described in detail previously [15]. Baseline FEV 1 (forced expiratory volume in one second) was measured as best of three values within 5% performed using Vitalograph drywedge bellows spirometer (Vitalograph Ltd, Buckingham, UK). This was determined 14 days after respiratory tract infection or use of bronchodilator or anti-allergic medication. 46 Caucasian families (n = 184) with at least two biological siblings with physician diagnosed asthma were recruited from the Nottingham area [15]. FEV 1 was defined as the best of three values. All subjects were classified according to British Thoracic Society Step Guidelines (BTS steps, ranging from step 1 to step 5) based on physician prescribed medication [16]. Ethical approval for the Previous GWAS associations include; a chromosome 17 locus (orm1-like protein3 (ORMDL3)/gasdermin B (GSDMB)), interleukin 33 (IL33), a chromosome 2 locus (interleukin 18 receptor (IL18R)/interleukin 1 receptor like 1 (IL1RL1)), mothers against decaptentaplegic drosophila homolog 3 (SMAD3), and IL2 receptor beta (IL2RB), phosphodiestase 4D, cAMP specific (PDE4D), a chromosome 1 locus (Crumbs homolog 1 precursor (CRB1)) and RAD50, S. Cerevisiae, homolog of (RAD50). MAF minor allele frequency, *a single MAF is shown where the original paper only reported the MAF in all subjects. The p-value stated is for asthma association in the original study (referenced).
Southampton subjects was obtained from the Southampton and South West Hampshire and the Portsmouth and South East Hampshire Local Research Ethics Committees and for the Nottingham subjects, the Nottingham University Medical School Ethics Committee. Written informed consent for study participation was obtained from participants (from the parent/guardian for the children).

SNP selection and genotyping
Eleven SNPs were chosen that have been identified from asthma GWAS identified 2007-2011 (Table 1). These were in eight loci; ORMDL3/GSDMB, IL33, IL18R/ IL1RL1, SMAD3, IL2RB, PDE4D, CRB1 and RAD50. SNPs were genotyped using KASPar technology by KBiosciences (Hertfordshire, UK). For quality control of genotyping data, Chi square was used to test for any deviation between the observed genotype frequencies and the expected, under the Hardy Weinberg Equilibrium.

Association analyses
Association analysis in the family cohorts between the GWAS SNPs and asthma-related traits was conducted using the family based association test (FBAT) software (version 1.5.1) [17] in the additive model. Clinically relevant endpoints including; asthma diagnosis, Forced Expiratory Volume in one second (FEV 1 % Predicted), bronchial hyperresponsiveness to methacholine (BHR), British Thoracic Society (BTS) defined severity [18] and atopy (positive response to skin prick test to any one of a panel of allergens as outlined [15]). We completed power calculations based on the method of Risch & Merikangas [19]. There was between 0.93 and 0.98 power to detect an association with asthma with a significance level of p = 0.05 and a relative risk of 1.5. There was between 0.51 and 0.69 power to detect an association with p = 0.05 and a relative risk of 1.25 (calculations based on minor allele frequency (min 0.17 to max 0.48 observed). Power curves for these MAF are provided in Additional file 1. To address multiple testing, we used Bonferonni correction, which considered p < 9.1×10 -4 significant. This was based on our eleven genotyped SNPs analysed for five phenotypes (0.05/55 total tests = 9.1×10 -4 as a threshold p-value).

Bioinformatic analysis of genotyped SNPs
HaploReg [14] (a web based tool) was used to provide information about the sequence surrounding a SNP. It has been developed to mine chromatin state information from nine human cell lines; GM12878, H1, HMEC, HSMM, HepG2, Huvec, K562, NHEK and NHLF. The tool provides information on SNPs that are in LD with the queried SNP (r 2 was set to 1 for these analyses and the population chosen for analyses was of European descent) and also gives information on which of these SNPs lie in enhancer histone marks, DNase sites and protein binding regions. It can also inform on the conservation of the relevant sequence and any changes to regulatory motifs based on the SNP allele changes. Each of the eleven genotyped SNPs were entered into HaploReg and analysis was completed with r 2 = 1. A more detailed analysis was completed for any SNP which showed significant association in the population study. We used the program mRNA by SNP browser (version 1.0.1) to investigate whether any of the GWAS SNPs genotyped as part of this study were associated with regions of known asthma expression quantitative trait loci (eQTL). This program incorporates a database of eQTL from asthma studies and provides a graphical output to browse data from GWAS. It provides association data between 54,675 transcripts and 406,912 SNPs. Statistics are provided for those SNPs with association p < 0.001. We used the program to search for each SNP in turn by inputting either the gene name or the SNP rs number, looking to see whether the association reached genomewide significance (p < 10 -8 ) [20]. We also used the eQTL browser from the University of Chicago (http://eqtl.uchicago. edu/cgi-bin/gbrowse/eqtl/) which considers eQTLs from different tissue sources. A second stage of this analysis was to investigate whether these same SNPs showed genomewide association with eQTL from a genome-wide search in 1,111 human lung samples [21].

Clinical characteristics and genotyping
Clinical characteristics for the asthma families are shown in Table 2. Mean age was 13.3 years and 10.3 years old, The recruitment and clinical characterisation of these subjects has been extensively described elsewhere [15]. PC 20 (4 mg/ml, %) only available in Southampton families.
mean FEV 1 (% Predicted) was 95.4 ± 15.6 and 96.2 ± 14.9 and % atopy (SPT) was 73.9% and 65.1% for sibling 1 and 2 respectively. These asthmatic children had predominantly mild-moderate disease as outlined by BTS guidelines. Genotyping was completed using KASPar (KBioscience) and quality control completed as outlined in previous studies [15]. All SNPs were in Hardy Weinberg equilibrium and allele frequencies in the UK families were similar to those previously reported (Table 1).

Bioinformatic analysis of SNPs using data from ENCODE
We sought to investigate whether data collected from the ENCODE Consortium [12] could enrich the potential functional significance of our genotyped GWAS associated SNPs and also those SNPs which were in LD ( Table 5). The two GSDMB SNPs were found to be in the same LD block (r 2 = 1), with 24 other SNPs. rs2244012 (RAD50/intronic) was found to be in an LD block with 50 other SNPs. The other genotyped SNPs were in smaller LD blocks. Most of our genotyped SNPs analysed were found to have at least one SNP (in their LD block) existing in an enhancer histone mark site, DNase site or a region where protein binding consensus sequences exist, highlighting a potential functional effect of these particular SNPs. There was evidence for LD SNPs altering regulatory motifs, with 14 SNPs in the GSDMB LD block and 25 SNPs in the RAD50 LD block the highest numbers ( Table 5).
Analyses of asthma associated GSDMB SNPs using ENCODE data As the two GSDMB SNPs (rs2305480/Ser311Pro and rs11078927/intronic) showed significant association in our study, we sought to further investigate potential functional implications using the ENCODE dataset. These two GSDMB SNPs themselves were not predicted to be important from a regulatory role, however they were in a LD block with 24 other SNPs. Almost all of the SNPs in this LD block were in non-coding/intronic regions, with the exception of rs10852935, which is a synonymous polymorphism in ZPBP2 and the genotyped rs2305480/Ser311Pro (Table 6). Twenty four of these SNPs were not conserved (according to the GERP FBAT was used to test for association using the additive model [17]. SNP single nucleotide polymorphism, MAF minor allele frequency, Fam number of families in analysis (≥10 families). Asthma = Doctor diagnosed asthma, FEV 1 = FEV 1 percent predicted, BHR = PC 20 (methacholine 4 mg/ml) only available in Southampton Families, Bonferroni correction, P < 9.1×10 -4 considered significant (bold).
conservation score). Three SNPs were located in enhancer histone marks (rs12709365 and rs13380815 both ZPBP2/intronic and identified in cell line GM12878), rs4795399 (GSDMB/intronic) was identified in a HepG2 cell line and also located in a DNase site identified in a human hepatocyte cell line (Huh7.5) and a human leukaemic cell line (CMK). While no SNPs were predicted to occur in a protein binding regions, 14 SNPs were predicted to alter regulatory binding motifs ( Table 6).

Analysis of asthma eQTL data
We searched for the eleven SNPs genotyped as part of this study in turn using the mRNA by SNP browser resource, looking to see whether the SNP was associated with the effect meeting genome-wide significance (p < 10 -8 ). Information for three of the SNPs (rs230548/GSDMB, rs2886098/CRB1 and rs2244012/RAD50) was not covered by the database and there was no information for SNPs in LD with these. Another two SNPs were also not covered by the database, however there was information for a SNP in high LD. For rs11078927/GSDMB, rs10445308 was in LD (r 2 = 1.0) and this SNP showed genome-wide significance with multiple ORMDL3 probes (all C-allele); 223259_at (p = 4.4×10 -13 , LOD = 11.39, H 2 = 14.75) and probe 235136_at (p = 8.0×10 -12 , LOD = 10.16, H 2 = 13.3) with average ORMDL3 (p = 3.9×10 -22 , LOD = 20.32, H 2 = 26.36). For rs3771166/IL18R1, rs54988956 was in LD (r 2 = 0.96), but this was not associated with genome-wide significance with any probe. For the other six SNPs, there was no genome-wide significance. Using the University of Chicago eQTL resource, data was available for the two GSDMB SNPs (rs2305480 and rs11078927) which provided some evidence that these were eQTLs and exon-QTLs in lymphoblastoid cell lines, monocytes, liver, brain and T-cells. More specifically, both rs2305480 and rs11078927 were eQTL for IKZF3, ORMDL3 and an exon-QTL for KRT222P, NR1D1 and ORMDL3. Only three of our eleven genotyped SNPs (rs2305480/GSDMB, rs3939286/IL33 and rs744910/SMAD3) were included in the genotyping platform for the genome-wide search for eQTL in 1,111 human lung samples [21]. None of these SNPs were found in the reported data meeting 10% False Discovery Rate (FDR). As a further analysis we also searched this resource for any of the SNPs in high LD (r 2 = 1) with our genotyped SNPs. rs1337167/CRB1 met the 10% FDR. This intron SNP was predicted to alter an AP-1 transcription factor binding site according to HaploReg.

Discussion
GWAS have been successful in identifying susceptibility genes for asthma through the use of large scale population cohorts and replication approaches [2][3][4][5][6][7]. However, there remains a need to determine the contribution of these polymorphisms to clinically relevant endpoints in asthma as opposed to disease diagnosis. We adopted a two step process where we selected SNPs from key genes in the Caucasian population which have met genomewide significance and/or with replication, namely eleven SNPs in the following genes; ORMDL3/GSDMB, IL33, IL18R/IL1RL1, SMAD3, IL2RB, PDE4D, CRB1 and RAD50. These SNPs were genotyped in a UK asthma family cohort with at least two affected siblings, to test for association with clinically relevant endpoints in asthma. Our data provide support for the original GSDMB association; however SNPs in the other genes did not show association after correction for multiple testing. Secondly, as these SNPs have previously shown association at a genome-wide level, we also investigated the potential for these SNPs and FBAT was used to test for association using the additive model [17]. SNP single nucleotide polymorphism, MAF minor allele frequency, Fam number of families in analysis (≥10 families). Severity = BTS classification (1-5) [18], SPT = positive skin prick test to one or more allergen. Bonferroni correction, P < 9.1×10 -4 considered significant (bold).  The 11 SNPs genotyped as part of this study were entered into the HaploReg tool to investigate whether SNPs in the same LD block were likely to affect chromatin structure and regulatory element binding motifs.
those in LD to affect chromatin states and alter regulatory motifs/binding sites to try to assign potential function to these associations. This analysis used recently reported data from the ENCODE project. These data support the hypothesis that SNPs associated in GWAS may be tagging SNPs for the actual causative variant, but also provide tentative evidence that these genotyped SNPs may themselves play a functional role e.g. GSDMB rs2305480 (Ser311Pro). GWAS have the advantage over linkage studies in that they can identify polymorphisms with small effect sizes and localise smaller susceptibility loci as LD only generally spans <500 kb [22]. Since the identification of childhood asthma associated variants on chromosome 17q21 [2], many other GWAS have been completed in different populations implicating variants spanning different genes (see above). One problem with these studies lies in defining the contribution of these associated variants to clinical endpoints in asthma. To this end we have completed a SNP for SNP analysis of key variants identified in the Caucasian population meeting genome-wide significance or where there was replication, focussing on GWAS from 2007-2011 (Table 1). We utilised a well characterised UK family based asthma cohort [15] to test for association with asthma diagnosis and the following clinical endpoints: FEV 1 (% predicted), BHR (to methacholine), BTS defined severity and positive SPT to one or more allergens, see [15]. These data show significant association for the two GSDMB SNPs; rs2305480 (Ser311Pro) and rs11078927 (intron) across more than one phenotype, with other phenotypes showing association which does not reach significance The two GSDMB SNPs genotyped as part of this study (rs2305480 and rs110788927) were part of a wider LD block containing 26 SNPs in total spanning 46,568 bp. These SNPs were analysed using HaploReg to see if they atlered any binding sites for regulatory elements or other proteins and also whether they affected chromatin states and their conservation according to GERP. The CEU frequency from 1000 Genomes Project =0.48 for all SNPs. CMK: human leukaemic cell line; GERP genomic evolutionary rate profiling score; GM12878: lymphoblastoid cell line; GSDMB gasdermin-domain containing protein family B; HepG2: human hepatocellular liver carcinoma cell line; Huh7.5: human hepatocellular cell line; ZPBP2 zona pellucid binding protein 2 gene. SNPs genotyped in the current study are shown in bold.
after correction for multiple testing. These data both replicate and extend previous findings [2,6] and identify this chromosome 17 locus as containing genetic determinants underlying multiple clinical features of asthma, including FEV 1 (% Predicted) and disease severity defined by BTS. These two polymorphisms are in high LD suggesting these associations represent a single locus. The BHR association (rs2305480) is supported by different SNPs in this gene also showing association with this phenotype, albeit in different ethnic populations [23,24]. Interestingly, a recent study has found association between rs2305480 and wheezing phenotypes in asthmatic children, but did not find association with intermediate phenotypes such as atopy or lung function [25]. When interpreting the results of GWAS, the associated variant could be tagging another variant in a larger region of LD [26]. This is particularly important as it is common that only a few variants in a haplotype block are present on genotyping platforms. Methods such as imputation and using genotyping data from the 1000 Genomes project can help to further characterise these patterns [27]. However, once these variants are identified it is still necessary to assign potential function and with many associated variants from GWAS residing in nontranscribed regions, they are likely to play a regulatory role in gene expression. We used HaploReg [14] to screen chromatin data, conservation data and regulatory motif databases which was obtained from the ENCODE project to see if any biological plausible role could be assigned to the SNPs genotyped in this study and those SNPs which were in LD with r 2 = 1. Our results show that these SNPs in LD can potentially lead to changes in regulatory element binding motifs affecting the transcription of genes positively or negatively and there is also the suggestion that alterations in enhancer histone marks or DNase sites could be affected. The nonsynonymous SNP (Ser 311 Pro) is predicted to be deleterious to protein structure (Polyphen) in Gasdermin B suggesting a potential functional mechanism altering protein function. This may be occurring at the mRNA level (altering the efficiency of splicing) as this polymorphism is predicted to be located in an exonic splicing enchancer site (FuncPred). Both GSDMB SNPs are in high LD with several SNPs spanning ZPBPZ and GSDMB which result in alterations in enhancer and transcription factor binding sites. However the rs2305480 and rs11078927 SNPs themselves are not predicted to have any regulatory role (ENCODE data). Gasderminfamily proteins have been implicated in TGFβ1 signalling and epithelial cell apoptosis [28], with increased airway cell apoptosis reported in severe asthma [29]. The extended chromosome 17 locus has also been implicated previously as containing SNPs associated with asthma spanning three main genes; ZPBP2/GSDMB/ORMDL3 and it is still unclear which genes/polymorphisms underlie these associations. The finding that the GSDMB SNPs including a non-synonymous SNP are particularly relevant to clinical endpoints including lung function, BHR and severity in our childhood asthma cohort is interesting as few recent studies have genotyped this GWAS SNP or investigated clinical outcomes with the same criteria such as BTS used in this study [24,30]. However, a recent study has investigated the effect of rs7216389 (which is in linkage disequilibrium with both GSDMB SNPs in this study) with respect to severe asthma subjects classified with poorly controlled disease whilst taking high doses of inhaled corticosteroids, long-acting bronchodilators and short acting β 2 agonists and showed association [31]. Similarly, while not meeting genome-wide association significance, we have previously provided evidence that rs2305480 is associated with severe asthma (p = 5.5×10 -5 , [32]), although we do acknowledge that the criteria for BTS severity is defined in part by lung function and so this association with severe asthma may be related to a synergistic effect. These data support previous studies that identified the chromosome 17 locus as containing markers of childhood onset disease [2,6,24,30], BHR [24] and elevated total serum IgE [24] in diverse ethnic populations.
Interestingly, no other SNP survived correction for multiple testing for any phenotype analysed, including asthma diagnosis and there was no indication of any other SNP showing a trend towards significance. CRB1 rs2786098 (p = 0.089 in the current study) has shown replicated association in multiple childhood cohorts, but did not reach statistical significance in our study. Perhaps this lack of association may be due to power (MAF 0.22, see earlier). One potential explanation for the lack of association for asthma and related phenotypes for the majority of SNPs is that many of the SNPs/genes were identified in adult asthma populations and the relative contribution to childhood asthma remains unclear at this time. However, in the results not meeting conventional correction for multiple testing, IL33 SNP rs1342326G (25.7 kb from gene transcription start site (TSS)) showed a level of association with atopic asthma defined by positive skin prick test and clinical diagnosis (p = 0.0097, Table 4). These data are in line with the accumulating evidence suggesting a role of IL33 in allergic mechanisms in asthma (Reviewed in [33]) although we cannot exclude that this is a false positive. Similarly, while not surviving correction, suggestive evidence for SNP associations for CRB1/BHR, IL18R/severity and IL33/SPT were apparent (p < 0.05), which may represent true associations.
Studies have shown the potential for GWAS SNPs to be tagging SNPs associated with eQTL [10]. Our analyses did provide some support for this particularly for the two GSDMB SNPs (rs2305480 and rs11078927) when eQTL analyses was completed across multiple cell types and tissues. These eQTL analyses identified regulation of multiple genes in the region, demonstrating the need for further work to define the biology underlying these genetic signals.
It is important to note the limitations of our study, while a SNP for SNP approach focussed to the pivotal SNPs from the GWAS has strengths to directly replicate and extend previous findings we acknowledge the assumption that the linkage disequilibrium pattern between the discovery Caucasian population and our UK population will be the same, which may at least in part explain the inability to detect associations with all variants. Similarly, we did not genotype all GWA significant SNPs identified in each of the regions. We acknowledge the reduced power of the current study using 370 families is a limitation potentially explaining while several SNPs showed nominal significance and did not meet stringent statistical criteria. A cohort of 602 families would be required to provide a power of 0.99 for the lowest MAF observed in the current study (relative risk of 1.5 and significance level of 0.05). We also acknowledge that the design of this study compared to discovery cohorts using case/control design also has reduced power, however we consider the use of an alternative association design i.e. family based to those previously published a strength. Furthermore the lack of assessing gene-environment interactions in GWAS and the current study may also account for not replicating previous observed effects due to differential environmental exposures between study populations. Finally, the potential for winners curse bias is also possible for direct replication of e.g. association with asthma diagnosis.

Conclusions
In summary, this study has provided a greater insight into the relative contribution of asthma GWAS SNPs to clinically relevant endpoints in asthmatic children and further defines the diverse role of GSDMB SNPs particularly in lung function and disease severity.