Genetic modifiers of Hb E/β0 thalassemia identified by a two-stage genome-wide association study

Background Patients with Hb E/β0 thalassemia display remarkable variability in disease severity. To identify genetic modifiers influencing disease severity, we conducted a two-stage genome scan in groups of 207 mild and 305 severe unrelated patients from Thailand with Hb E/β0 thalassemia and normal α-globin genes. Methods First, we estimated and compared the allele frequencies of approximately 110,000 gene-based single nucleotide polymorphisms (SNPs) in pooled DNAs from different severity groups. The 756 SNPs that showed reproducible allelic differences at P < 0.02 by pooling were selected for individual genotyping. Results After adjustment for age, gender and geographic region, logistic regression models showed 50 SNPs significantly associated with disease severity (P < 0.05) after Bonferroni adjustment for multiple testing. Forty-one SNPs in a large LD block within the β-globin gene cluster had major alleles associated with severe disease. The most significant was bthal_bg200 (odds ratio (OR) = 5.56, P = 2.6 × 10-13). Seven SNPs in two distinct LD blocks within a region centromeric to the β-globin gene cluster that contains many olfactory receptor genes were also associated with disease severity; rs3886223 had the strongest association (OR = 3.03, P = 3.7 × 10-11). Several previously unreported SNPs were also significantly associated with disease severity. Conclusions These results suggest that there may be an additional regulatory region centromeric to the β-globin gene cluster that affects disease severity by modulating fetal hemoglobin expression.

Background β-Thalassemia is a genetic disease in which an abnormal β-globin gene results in decreased (β+ thalassemia) or completely absent (β 0 thalassemia) production of the normal β-globin chain [1]. The hemoglobin E (HbE) allele, point mutation (G A) in codon 26 (Glu Lys) of the β-globin gene, can induce alternative splicing and thus result in decreased β-globin E chains [2,3]. Homozygosity for the HbE allele results in hypochromic microcytosis and minimal anemia, but the compound heterozygous condition, HbE/β 0 thalassemia causes a surprisingly variable anemia, ranging from nearly asymptomatic states to severe anemia and transfusiondependency [4][5][6]. Hb E/β 0 thalassemia is most prevalent in and around Thailand, and with large numbers affected in other countries in Southeast Asian [7].
Several disease modifying hematological parameters have been identified, including fetal Hb (HbF), [8] red blood cell count (RBCC), Hb, mean corpuscular volume (MCV), mean corpuscular Hb (MCH), hemocrit, HbE, [9] and genetic factors contributing to the variability of some these parameters have been identified. HbF is the primary modifier of disease severity, because it can compensate for the lack of β-globin chains and HbA. There are now three known HbF QTLs, accounting for~50% of the HbF heritability, Xmn1, HBS1L-MYB, and BCL11A [10][11][12][13][14][15]. There are other QTLs that appear to impact hematological parameters, including lymphotoxin alpha and tumor necrosis factor alpha associated with monocyte chemotactic protein-1 (MCP-1), [16] and a recent GWAS found SNPs on chromosomes 5 and 10 associated with MCV, SNPs on chromosomes 1, 4, 6, and 20 associated with RBCC, and numerous SNPs in the β-globin gene cluster and in erythrocyte membrane protein band 4.1-like 2 associated with hematocrit, HgB, MCH, MCV, and RBCC [17]. These QTLs for disease severity are biologically plausible but require confirmation.
Here, we attempt to identify variants that affect severity of HbE/β 0 thalassemia. An earlier study of SNPs in five genes known to influence globin gene expression and erythropoiesis failed to find significant associations with disease severity [18]. Subsequently, we observed strong association with 45 SNPs in a large linkage disequilibrium (LD) block containing the β-globin gene [19]. This association signal was ascribed to an effect of the XmnI polymorphism. To expand the search for severity modifying variants, we conducted a two-stage (pooled genome wide genotyping followed by individual genotyping in the top signals from stage one) association study on a sample of Thai Hb E/β 0 thalassemia patients.

Subjects
Sample collection, measurement of hematological data, and categorization of patients in this study were described elsewhere [18]. Briefly, a total of 1060 patients with Hb E/β 0 -thalassemia were ascertained from geographic regions across Thailand and a scoring system based on seven clinical criteria [18] was used to classify patients as mild (score = 0-3.5), moderate (score = 4-7) or severe (score = 7.5-10). Table 1 lists the clinical criteria and describes how they contribute to the severity score. Patients with a clinically intermediate form of the disease, detectable normal adult hemoglobin (HbA), other βor α-globin mutation that known to affect disease severity such as (δβ) 0 -thalassemia or hereditary persistence of fetal hemoglobin, or α-thalassemia were excluded. Our analyses were based on the remaining patients who were classified as mild (n = 207) or severe (n = 305) disease. The study protocol was approved by the Institutional Review Board of Mahidol University and all subjects provided informed consent.

Hemoglobin Assays
Hemoglobin type and quantity were determined by automated hemoglobin cation exchange chromatography (Bio-Rad Variant, β-Thalassemia Short Program; Hercules, CA).

Stage 1: Pooled Genotyping
Initially, two regionally matched pools were created, one representing participants with severe disease and another containing those with mild disease. For the severe disease pool, samples from individuals with the highest severity scores, calculated using the scoring system, were preferentially selected. The 395 unrelated samples were divided to create one DNA pool representing cases with mild symptoms (197 DNA samples) and one pool representing severe symptom (198 DNA samples). DNA was amplified with multiple displacement amplification (MDA) technology performed by Molecular Staging, Inc. (New Haven, CT). MDA allows for uniform genome amplification that results in minimal amplification bias across the genome (data not shown). To create equimolar pools of DNA, we used a stringent protocol for DNA concentration measurement using PicoGreen™ fluorescent dye in a Fluoroskan Ascent™ fluorometer.
The pooled genotyping was conducted by Sequenom, Inc. (San Diego, CA) using the MassARRAY platform. The MassARRAY platform for SNP scoring was complemented by a homogeneous, single-tube assay method (hME™ or homogeneous MassEXTEND™) in which two oligonucleotide primers anneal to and amplify a genomic target surrounding the polymorphism. A third primer (EXTEND), complementary to the amplified target up to but not including the polymorphism, was enzymatically extended one or a few bases through the polymorphic site by a DNA polymerase and then terminated by sequence-determined incorporation of dideoxynucleotides. PCR and Mas-sEXTEND reactions for all assays were performed on each DNA pool. Organized in sets of 380 assays, two primer plates were combined to perform a PCR with each DNA pool. The PCR product plates were combined with the EXTEND primers, and the extension reaction was conducted. The obtained extension products were spotted four times onto a SpectroChip and measured with multiple rasters with a mass spectrometer. The measurements made on each chip were quantified using Sequenom's peak recognition and interpretation software. Areas under the peaks were calculated using optimized algorithms, and relative allele frequencies are estimated at 110,000 genebased SNPs. The assayed SNPs had a median spacing of 10.4 kilobases and covered approximately 99% of all known and predicted human genes.

Stage 2: Verification and Individual Genotyping
SNPs with allele frequencies suggesting significant differences (P < 0.02) across pools and were selected for verification by repeated pooled DNA analysis. This P-value threshold was chosen to limit the number of SNPs for individual genotyping and likewise the number of false positive association signals and also to provide the number of SNPs for individual genotyping specified in the study design. At this threshold, 808 SNPs showed reproducible allele frequency differences between disease severity pools and were individually genotyped using the MassARRAY™ system (Sequenom, Inc., San Diego, CA). For quality control purposes, 29 subjects were genotyped twice and one subject three times. Individuals and SNPs (N = 17) with substantial missing data (< 90%) were not considered for further analyses because these features usually indicate poor DNA quality and problems with genotype calls, respectively. Also, five X chromosomal SNPs with high heterozygosity in males (>1%) and nine SNPs showing inconsistencies in duplicated subjects or multiple Mendelian errors (>5%) were discarded for further analyses. Twenty-one other SNPs were also excluded because they had a low minor allele frequency, MAF, (<0.02 in both the case and control samples). Since all subjects in our study population have Hb E/β 0 -thalassemia and tests for Hardy-Weinberg equilibrium (HWE) in case-only populations are subject to many caveats [20], SNPs with significant deviation from HWE were not excluded. In total, 52 SNPs were dropped, leaving 756 for analysis.

Statistical Analysis
The allele frequencies of the initial set of 110,000 SNPs were compared between pools containing individuals with mild disease vs. severe disease with Z tests with variances adjusted for variation in source DNA quantitation, diluting, and pipetting, as well as PCR/mass extension and chip dispensing/mass spectrometry measurement.
The 756 individually genotyped SNPs which survived quality control were tested for association with the disease severity classification variable and also with HbF%, a quantitative measure known to affect disease severity. Associations with the binary disease severity outcome and HbF% were tested using logistic and linear regression models, respectively, adjusted for gender, age, and geographic region. Since HbF% measurements are not valid in regularly transfused patients, we only analyzed this trait in the mild disease group, who are not regularly transfused. The SNPs were modeled as the linear effect of the number of minor alleles. To correct for multiple testing, we employed a simple Bonferroni adjustment based on the 765 SNPs that survived quality control and were tested for association. We also computed a false discovery rate (FDR) for each P-value [21]. We present both the Bonferroni and FDR adjusted P-values to better illustrate the different levels of significance of the results. The FDR is an accepted measure of the significance of multiple association results, while we also state which findings pass the overly conservative Bonferroni threshold to highlight the highly significant findings. Finally, the linkage disequilibrium (LD) structure of the SNPs on chromosome 11, which included the β-globin gene cluster, was analyzed using Haploview [22].

Results
The sample of 512 genotyped individuals was 51% female and represented five geographical regions in Thailand. Participants with mild disease were significantly older than those with severe disease (P = 0.0002), and males had significantly greater odds of severe disease than females (OR = 1.62, P = 0.01). The mean disease severity score was 2.1 among participants classified as having mild disease and 8.4 in those with severe disease. The ratio of HbE to HbF increases in regularly transfused patients, and likewise we observe a higher HbE to HbF ratio in the severe disease group who require more frequent transfusions. Table 2 shows the characteristics of the study sample by disease severity status, including demographic information and hematological measures. The mean age in the severe group was lower than the mild group, mostly because severe patients are identified at younger age. The difference in severity between males and females has no known biological explanation and likely represents random variation, an issue which we correct by adjustment. It should be noted that the hematological measures in the severe group are skewed from their genetically determined values because of the frequent transfusions in that group. Thus, no further analyses of these measures were performed in the severe group.
A total of 808 of the 110,000 SNPs showed reproducible allele frequency differences (P < 0.02) between the samples of pooled DNA from mild and severe cases, and these SNPs were genotyped in individual samples. Fifty-two of the genotyped SNPs were excluded for further analysis after quality control of the genotype data. Prior to correction for multiple testing, 377 of the 756 analyzed SNPs showed evidence for association with disease severity at P < 0.05, and 252 of these had a corresponding FDR = 5%. These included SNPs in the βglobin gene cluster on chromosome 11 previously reported to be associated with HbF levels in other samples, including the Xmn1 polymorphism 158 bp upstream from the Gγ gene and the HincII site in the ψβ-globin gene. Five of these SNPs were previously reported to be associated with disease severity in this sample by by Ma et al. [19] (see table 3). We observed an association with rs4376364 HBS1L (P = 4.3 × 10 -4 ), although it was not significant after Bonferroni adjustment. Table 3 shows the allele frequencies, locations, and association results for the top 81 SNPs which had a FDR = 1%. Of these 81 SNPs, 50 remained significant after Bonferroni correction. LD analysis of the region on chromosome 11 encompassing 52 of the SNPs listed in table 3 revealed ten distinct haplotype blocks, the largest spanning 56.5 Kb and containing most of the significant SNPs (see Figure 1).
In addition to the previously reported associations with SNPs in the β-globin cluster, we identified association signals with seven SNPs in the region upstream (centromeric) from the β-globin cluster that were not in the same haplotype block. Rs3886223 was the most significantly associated SNP in this group, with the common allele contributing to increased risk of severe disease in an additive fashion (OR = 3.03, P = 2.82 × 10 -8 ).
The association with this SNP remained significant after adjustment for SNPs in the β-globin cluster, including Xmn1, suggesting that the biological variant tagged by this SNP is distinct from the one in the β-globin cluster. Xmn1 is correlated with rs3759074 at r 2 = .92 and with rs3886223 at r 2 = .46. This region contains many olfactory receptor genes and rs3886223 is less than 500 bp upstream from OR51B2.
Only one result among SNPs on other chromosomes remained significant after Bonferroni adjustment; rs6972505 (OR = 2.13, P = 7.6 × 10 -6 ) is in an intron of solute carrier family 26 member 5 (SLC26A5) located on chromosome 7. A second SNP, rs234915 in the GDPmannose 4,6-dehydratase gene (GMDS) on chromosome 6, was not significant after Bonferroni adjustment but had a significant FDR (OR = 1.92, P = 1.3 × 10 -4 ), with the major allele associated with severe disease. Several other SNPs had a FDR less than 0.01 (see Table 3). Tests for genetic association with HbF% within the mild disease group revealed nominally significant results with 14 SNPs in the β-globin gene cluster (Table 3).

Discussion
A previous study of a Hong Kong Chinese population showed HbF to be 80-90% heritable, [12] and previous studies have shown that Xmn1 accounts for approximately 13% of the variation in HbF in normal Europeans [11] and for a similar proportion in a Chinese sample heterozygous for the β-thalassemia mutation [12]. In addition, about 15% of the HbF variance was explained by a polymorphism in BCL11A on 2p15, 19% by an intergenic variant in HBS1L-MYB on 6q23, and 10% by Xmn1 in Europeans without hemoglobinopathies [10]. Assuming a maximum of 50% of the total trait variance has been explained, a substantial number of QTL contributing to the heritability of HbF remain undiscovered. By studying a disease severity phenotype we have the opportunity to identify variants that affect disease through HbF as well as other pathways. In this study, we replicated association findings from previous studies of several disease-modifying SNPs in a Thai population with Hb E/β 0 thalassemia and report associations with SNPs that may represent additional β-globin gene regulatory regions or that alter disease severity through novel pathways.
This study is the first whole genome association scan for modifiers of Hb E/β 0 thalassemia. The two-stage study design allowed for cost effective identification of a targeted set of SNPs for individual genotyping. The fact that most of the SNPs identified as having significant differences in allele frequency between severity groups in the analysis of pooled DNA samples also showed nominally significant associations with severity after individual genotyping demonstrates the utility of a twostage design.
Our study has several limitations. Since the pooled genotyping was done on a marker panel that is less dense than ones used in many contemporary genome wide association studies, we do not have the same level of genome coverage provided by other genotyping platforms. While our use of a gene-based SNP panel likely reduced the number of false negative results due to low marker density, the fact that no SNPs were typed in the BCL11A region, a well-confirmed modifier of HbF level, [11][12][13][14][15] underscores the gain in information afforded by the high density SNP arrays. In any case, the low marker density does not negate the highly significant associations we did identify, particularly those independent of the β-globin gene region. The SNPs allelotyped in the discovery phase are the set of marker assays developed at Sequenom and demonstrated to be polymorphic in Caucasians. Thus, prior to this study, the MAFs of these SNPs were unknown in the Thai population. Similar to all GWA studies of non-Caucasians using SNP chips developed based on marker allele frequencies in Caucasians, some SNPs were not polymorphic and hence a true association in this circumstance would not be detected. The observation that several SNPs within the β-globin gene cluster were more significantly associated than Xmn1 with disease severity does not imply that one of these variants rather than XmnI has a causal effect on disease severity. There is a substantial body of literature suggesting Xmn1 has a functional role in determining HbF% and F-cell number [11,12,19] and there is no such evidence for any of these other SNPs Alternately, this observations may be due to Xmn1 allele frequency differences across populations. For example, the frequency of the Xmn1 minor allele is much higher in this sample (42%) than in most other populations studied, including a sample of unaffected Hong Kong Chinese parents of offspring with β-thalassemia (17%), [12] an unselected group of 720 UK twins (33%), [10] the African American Cooperative Study of Sickle Cell Disease (7%), [23] a Brazilian sickle cell disease study sample (0%), [23] and a group of 58 Chinese β-thalassemia patients (5%) [24]. Although there are small differences in significance among SNPs across the β-globin gene cluster, all of these SNPs have similar minor allele frequencies and are highly correlated, and it is not possible to identify definitively all causal variants in the β-globin cluster by genetic association analysis. Functional studies are needed to delineate the spectrum of independent variants in the β-globin region with a causal effect on disease severity and HbF level.
The associations observed with SNPs in the olfactory receptor cluster, which is adjacent to but distinct from the β-globin cluster, are novel and may represent a new disease modifying region. The biological connection between olfactory receptors, which interact with odorant molecules in the nose to initiate a neuronal response that triggers the perception of smell, and severity of Hb E/β 0 thalassemia is not obvious. However, these SNPs (nearly all of which are located in regulatory regions), or variants strongly associated with them, may be acting as cis-regulatory elements [25] and thus down-regulate transcription of HbF or impact disease severity by altering expression of other β-globin genes. This hypothesis is consistent with the observation that at least one of these genes is expressed in human erythroid cells at all stages of development [26]. High-level transcription of the globin genes through enhancement by a remote cis element, the locus control region (LCR), may involve chromatin remodeling [27] including the region containing the olfactory receptor genes. As these findings are novel, they have not been replicated in an independent sample. Replication of these associations for either disease severity or HbF is necessary.
The findings with SNPs in SLC26A5 and GDMS are also novel and need to be replicated in independent samples. SLC26A5 acts as the primary motor molecule in cochlear outer hair cells due to its ability to undergo voltage-induced conformational change [28]. Although no pathway through which this gene might affect thalassemia severity is immediately apparent, the SLC26A5 gene footprint partially overlaps the proteasome 26S subumit gene (PSMC2), which immunoprecipitates with transcription factors for RNA polymerase II in rats, indicating a role in transcriptional regulation [29]. GDMS has functions related to embryonic development and in regulation of the immune response [30]. Finally, of note there was only one individually typed SNP in the region containing BCL11A, and it was 30 Kb from the gene. The SNP was removed due to low call rate, but it was nominally associated with HbF% in the mild group (P = 0.01).

Conclusions
These results suggest that there may be an additional regulatory region centromeric to the β-globin gene cluster that affects disease severity by modulating fetal hemoglobin expression. Also, the replication of previous findings related to HbF highlight the utility of pooled genotyping as a cost effective first step to identify a subset of SNPs for individual genotyping.