- Research article
- Open Access
- Open Peer Review
Genetic association of fetal-hemoglobin levels in individuals with sickle cell disease in Tanzania maps to conserved regulatory elements within the MYB core enhancer
© Mtatiro et al.; licensee BioMed Central. 2015
- Received: 21 January 2015
- Accepted: 23 January 2015
- Published: 10 February 2015
Common genetic variants residing near upstream regulatory elements for MYB, the gene encoding transcription factor cMYB, promote the persistence of fetal hemoglobin (HbF) into adulthood. While they have no consequences in healthy individuals, high HbF levels have major clinical benefits in patients with sickle cell disease (SCD) or β thalassemia. Here, we present our detailed investigation of HBS1L-MYB intergenic polymorphism block 2 (HMIP-2), the central component of the complex quantitative-trait locus upstream of MYB, in 1,022 individuals with SCD in Tanzania.
We have looked at 1022 individuals with HbSS or HbS/β0 in Tanzania. In order to achieve a detailed analysis of HMIP-2, we performed targeted genotyping for a total of 10 SNPs and extracted additional 528 SNPs information from a genome wide scan involving the same population. Using MACH, we utilized the existing YRI data from 1000 genomes to impute 54 SNPs situated within HIMP-2.
Seven HbF-increasing, low-frequency variants (β > 0.3, p < 10−5, f ≤ 0.05) were located in two partially-independent sub-loci, HMIP-2A and HMIP-2B. The spectrum of haplotypes carrying such alleles was diverse when compared to European and West African reference populations: we detected one such haplotype at sub-locus HMIP-2A, two at HMIP-2B, and a fourth including high-HbF alleles at both sub-loci (‘Eurasian’ haplotype clade). In the region of HMIP-2A a putative functional variant (a 3-bp indel) has been described previously, but no such candidate causative variant exists at HMIP-2B. Extending our dataset through imputation with 1000 Genomes, whole-genome-sequence data, we have mapped peak association at HMIP-2B to an 11-kb region around rs9494145 and rs9483788, flanked by two conserved regulatory elements for MYB.
Studies in populations from the African continent provide distinct opportunities for mapping disease-modifying genetic loci, especially for conditions that are highly prevalent there, such as SCD. Population-genetic characteristics of our cohort, such as ethnic diversity and the predominance of shorter, African-type haplotypes, can add to the power of such studies.
- Sickle Cell Disease
- Sickle Cell Disease Patient
- Trait Association
- Fetal Hemoglobin
Sickle cell disease (SCD) is a hemoglobin disorder caused by the Glu6Val mutation in the β chain of adult hemoglobin. The resulting hemoglobin variant, HbS, is prone to polymerization, disrupting red blood cell shape, function and life span. SCD is prevalent in Sub-Saharan Africa, where it is a significant contributor to childhood mortality . In Tanzania, 8,000-11,000 affected children are born annually . The most common and severe forms of the disease are due to homozygosity for the mutation (HbSS) or compound heterozygosity with β0 thalassemia (HbS/β0thalassemia). Where newborn screening and prophylactic penicillin are available, childhood mortality due to SCD is significantly reduced, but patients nevertheless remain at risk for chronic complications and premature death. The disease is milder in those patients that carry significant amounts of fetal hemoglobin (HbF) in their circulating red blood cells . Similar to healthy populations, HbF persistence in patients with SCD is partially genetically controlled, and three HbF quantitative-trait loci (QTLs) - HBG2 [4,5], BCL11A [6,7] and HBS1L-MYB  - have been identified. Knowledge of the genetic factors underlying HbF persistence is helping to interpret the clinical variability of SCD and has led to the identification of novel molecular targets for the therapeutic reactivation of HbF.
HBS1L-MYB is unique among the HbF modifier loci because it has marked pleiotropic effects, i.e., in healthy individuals it affects general hematological parameters  as well as HbF. It has been postulated that changes in HbF levels caused by this locus are secondary to altered kinetics of erythropoiesis . The locus consists of several linkage disequilibrium (LD) blocks of common variants, which affect erythroid traits independently . The most effective of these, termed HMIP-2 (HBS1L-MYB intergenic polymorphism, block 2) has been shown to influence disease severity in patients with SCD  and β thalassemia [8,12]. HMIP-2 variants reside within the core enhancer for MYB , a key hematopoietic regulator gene . It is divided further into sub-loci HMIP-2A and -2B, which provide independent HbF association in African populations, including SCD patients [11,15-18]. A 3-bp deletion (rs66650371) at HMIP-2A is suspected to directly cause HbF variability , but is independent of the trait association seen at HMIP-2B. Therefore, causative variants acting at HMIP-2B are still to be discovered.
To better define the HbF association signal at HMIP-2B, and to identify candidate variants for trait causation, we dissected HMIP-2 and its effect on HbF persistence in a large SCD patient cohort from Tanzania. The Tanzanian population is well-suited to genetic fine-mapping studies, with a marked ethnic diversity [20,21] and the increased mapping resolution that is characteristic for African chromosomes [22,23].
Study subjects, sample collection and phenotyping
Only patients with Hb SS or HbS/β0 thalassemia genotype were included in this study. Enrollment of patients, diagnosis and confirmation of sickle phenotypes as well as the quantification of hemoglobin subtypes were performed as previously described . Informed consent was obtained for each patient and ethical approval given by the Muhimbili University Research and Publications Committee (MU/RP/AEC/VOLX1/33). During follow-up clinics, a 2-ml blood sample was collected from non-transfused SCD patients (confirmed Hb SS genotype) who are not on hydroxyurea treatment. This study includes 1,022 individuals with HbF measured (by HPLC, Variant I, Biorad, Hercules, CA, USA) at the age of 5 years or older. The median age of the SCD population is 11 years; males and females are represented equally. HbF values vary significantly, with a median of 5.4% (of total hemoglobin).
DNA was extracted from archived buffy coat using the Nucleon BACC II system (GE Healthcare, Little Chalfont, UK). Genotypes for 528 regional SNPs were extracted from a genome-wide SNP set generated at the Wellcome Trust Sanger Institute on the Human Omnichip 2.5 platform (Illumina, La Jolla, CA, USA), as described elsewhere . Targeted genotyping was performed, adding ten markers with known trait association: rs9376090, rs9399137, rs9402686, rs9389269 and rs9494142 by TaqMan procedure , rs9389268 and rs9376091 by PCR product sequencing (amplification and sequencing: F: 5’-TGCTTCTGGCAGTGAATTAACCTTGT-3’, R: 5’-AGTTTGGTGCCAAAGGTAGCAGAT-3’), indels rs66650371 and rs11321816 by multiplex PCR fragment sizing (F1: 5’-GTTTGATGTTGCAGAAGAACAAAGC-3’ R1: 5’-VIC-TAAGTGTCTTCTGAGGGAACC-3’, F2: 5’-FAM-TCACCTTAAAAGGCGGTATTG-3’, R2: 5’-GTTT-AAGCACTTTGGCAAGCAT-3’) and rs35786788 by SNaPshot procedure (F:5’-FAM-TCACCTTAAAAGGCGGTATTG-3’, R:5’-GTTT-AAGCACTTTGGCAAGCAT-3’, extension: 5’-ACTATATCTGTGCACAGAAATACAG-3’). All assays were performed under supplier-recommended conditions (Applied Biosystems, Foster City, CA, US), including the fragment sizing, which used the Taq Gold (Applied Biosystems) microsatellite genotyping protocol. Fragment sizes and SNaPshot products were evaluated by capillary electrophoresis (3130 Genetic analyzer, Applied Biosystems), with subsequent allele scoring using GeneMarker v1.95 (SoftGenetics, State College, PA, USA). Marker quality control consisted of Hardy-Weinberg equilibrium testing and call rate evaluation (cut off >80%). Imputation with MACH 1.0 [26,27] was used to fill in missing genotypes.
Phased variant call files from the 1000 Genomes project  for the YRI population sample were accessed on 24/4/2013 (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr6.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz) using the ‘Data Slicer’ tool at http://browser.1000genomes.org. Haplotype files were derived, purged of non-informative variants (monomorphic and singletons) and used to impute 54 non-genotyped variants in the target area, using MACH 1.0.16 [29,30].
GWAS data was processed with the PLINK software package (http://pngu.mgh.havard.edu/purcell/plink/). Test for genetic association with ln[%HbF], including conditional analysis, was performed with STATA v12 (Stata Corp, College Station, TX) using multiple linear regression with age and sex as covariates. Haplotype relative effects were estimated using multifactor ANOVA in R (http://www.r-projects.org/), correcting for pair-wise comparison using Tukey’s method, including age and sex as covariates.
Genetic association with HbF in the HBS1L-MYB intergenic region
Association of HMIP-2 variants with fetal-hemoglobin levels (ln[HbF%]) in Tanzanian patients with SCA
Chr 6 position
T → C
T → C
G → A
C → T
G → T
I → D
4.06 x 10−8
T → C
3.65 x 10−7
I → D
G → A
5.07 x 10−8
G → A
A → G
T → C
G → A
A → G
T → C
4.97 x 10−7
G → A
A → G
C → T
C → A
African-specific trait association at HMIP-2B
To dissect the haplotype architecture underlying the trait association pattern at HMIP-2, we phase-aligned genotypes for the seven strongly-associated markers. To relate our data to findings in other populations, we also included rs9376090, which tags European and Asian high-HbF alleles, and rs4895441, which is part of the HMIP-2B sub-locus in other SCD patient populations.
Relative effects of HMIP-2 haplotypes on the ln[HbF%] trait
a-B1 vs. a-b
a-B 2 vs. a-b
7 x 10 −3
a-B 3 vs. a-b
A-b vs. a-b
A-B vs. a-b
a-B1 vs. a-B2
a-B1 vs. a-B3
a-B2 vs. a-B3
a-B1 vs. A-b
a-B2 vs. A-b
a-B3 vs. A-b
A-B vs. A-b
a-B1 vs. A-B
a-B2 vs. A-B
a-B3 vs. A-B
Performing trait-association for SNPs across the HBS1L-MYB intergenic region on chromosome 6q24.3 in SCD patients from Tanzania, we detected significant association with HbF levels at HMIP-2, a globally-prevalent HbF QTL [7-9,11,15,17,19,31-35] residing within the MYB enhancer region . Some of these variants have been also associated with white blood counts, mean cell volume and mean cell hemoglobin in our population . Our interest was focused on sub-locus HMIP-2B, where a causative variant has not yet been identified. After excluding patients with longer, ‘Eurasian’-type  high-HbF haplotypes and including imputed variants from the YRI (Yoruba, 1000 Genomes sequence data) population, we determined the most likely map location of HMIP-2B as an 11-kb segment including the enhancer core element −71 and the interval between elements −71 and −63 (Figure 2), where peak association (rs9494145, rs9483788) was detected.
The two HbF-boosting haplotypes underlying this association peak, ‘a – B2’ and ‘a – B3’, share rs9494145-C and rs9483788-C (Figure 3). ‘a - B2’, which contains all four HbF-boosting alleles (rs9389269-C, rs9402686-A, rs9494145-C and rs9483788-C), has the stronger effect of the two. This means that none of the four strongly trait-associated SNPs detected at HMIP-2B in Tanzanian patients appears to fulfill the conditions for being the singular causative variant, i.e. both, being necessary to show a significant effect and sufficient to produce the maximum genetic effect originating from this sub-locus. Thus, additional variants, not present in the 1000 Genomes dataset, might contribute to trait variability.
Long, ‘Eurasian-type’ (with high-HbF associated alleles across all of HMIP-2 ), high-HbF haplotypes were present in the patient cohort at a low frequency. These haplotypes are tagged by the ancestry-informative allele rs9376090-C (Figure 3). 24% of individuals with such haplotypes reported Arabic parental ethnicity, compared to 2% in the general cohort. The high HbF levels we observed in such patients (a median of 9.3% in ‘A-B’/’a-b’ heterozygotes, Figure 4) are likely due to the presence of the 3-bp deleted allele at HMIP-2A and possibly another functional allele at HMIP-2B. Population stratification might also contribute to higher levels of HbF: Arab/Indian sickle mutation haplotypes on chromosome 11 are known to result in milder disease and high HbF levels .
We also observed a residual association after conditioning on HMIP-2A and HMIP-2B. We suspect that these are part of a group of linked SNPs that overlaps the physical location of HMIP-1, a HbF QTL detected upstream of HMIP-2 (Figure 1A) in the European population . However we didn’t feel we have the power to investigate this further with the present dataset.
We have localized HMIP-2B, a QTL for fetal-hemoglobin persistence, to an 11-kb region within the core enhancer for MYB. So far, we have not identified a likely functional variant within or at this locus. Further studies will involve extended sequence analysis in groups of patients carrying a-B2 and a-B3 haplotypes.
The authors thank the patients and staff of Muhimbili National Hospital, Muhimbili University of Health and Allied Sciences, Tanzania, Hematology Outpatient Unit and staff of King’s College Hospital, London, and members of Professor Thein’s Molecular Hematology group, King’s College London.
This work was supported by Wellcome Trust (Grant no: 095009, 093727, 080025 & 084538) and commonwealth split-side fellowship (TZCN-2012-361).
The sponsors of this study are nonprofit organizations that support science in general. They had no role in gathering, analyzing, or interpreting the data.
- Weatherall D, Akinyanju O, Fucharoen S, Olivieri N, Musgrove P. Inherited disorders of hemoglobin. In: Jamison DT, Breman JG, Measham AR, Alleyne G, Claeson M, Evans DB, Jha P, Mills A, Musgrove P, editors. Disease control priorities in developing countries. 2nd ed. Washington (DC): World Bank; 2006. p. 663–80.Google Scholar
- Piel FB, Patil AP, Howes RE, Nyangiri OA, Gething PW, Dewi M, et al. Global epidemiology of sickle haemoglobin in neonates: a contemporary geostatistical model-based map and population estimates. Lancet. 2013;381(9861):142–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Platt OS, Brambilla DJ, Rosse WF, Milner PF, Castro O, Steinberg MH, et al. Mortality in sickle cell disease. Life expectancy and risk factors for early death. N Engl J Med. 1994;330(23):1639–44.View ArticlePubMedGoogle Scholar
- Gilman JG, Huisman TH. DNA sequence variation associated with elevated fetal G gamma globin production. Blood. 1985;66(4):783–7.PubMedGoogle Scholar
- Sampietro M, Thein SL, Contreras M, Pazmany L. Variation of HbF and F-cell number with the G-gamma Xmn I (C-T) polymorphism in normal individuals. Blood. 1992;79(3):832–3.PubMedGoogle Scholar
- Menzel S, Garner C, Gut I, Matsuda F, Yamaguchi M, Heath S, et al. A QTL influencing F cell production maps to a gene encoding a zinc-finger protein on chromosome 2p15. Nat Genet. 2007;39(10):1197–9.View ArticlePubMedGoogle Scholar
- Uda M, Galanello R, Sanna S, Lettre G, Sankaran VG, Chen W, et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proc Natl Acad Sci U S A. 2008;105(5):1620–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Thein SL, Menzel S, Peng X, Best S, Jiang J, Close J, et al. Intergenic variants of HBS1L-MYB are responsible for a major quantitative trait locus on chromosome 6q23 influencing fetal hemoglobin levels in adults. Proc Natl Acad Sci U S A. 2007;104(27):11346–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Menzel S, Jiang J, Silver N, Gallagher J, Cunningham J, Surdulescu G, et al. The HBS1L-MYB intergenic region on chromosome 6q23.3 influences erythrocyte, platelet, and monocyte counts in humans. Blood. 2007;110(10):3624–6.View ArticlePubMedGoogle Scholar
- Thein SL, Menzel S, Lathrop M, Garner C. Control of fetal hemoglobin: new insights emerging from genomics and clinical implications. Hum Mol Genet. 2009;18(R2):R216–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Lettre G, Sankaran VG, Bezerra MA, Araujo AS, Uda M, Sanna S, et al. DNA polymorphisms at the BCL11A, HBS1L-MYB, and beta-globin loci associate with fetal hemoglobin levels and pain crises in sickle cell disease. Proc Natl Acad Sci U S A. 2008;105(33):11869–74.View ArticlePubMedPubMed CentralGoogle Scholar
- Galanello R, Sanna S, Perseu L, Sollaino MC, Satta S, Lai ME, et al. Amelioration of Sardinian beta0 thalassemia by genetic modifiers. Blood. 2009;114(18):3935–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Stadhouders R, Aktuna S, Thongjuea S, Aghajanirefah A, Pourfarzad F, van Ijcken W, et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J Clin Invest. 2014;124(4):1699–710.View ArticlePubMedPubMed CentralGoogle Scholar
- Mucenski ML, McLain K, Kier AB, Swerdlow SH, Schreiner CM, Miller TA, et al. A functional c-myb gene is required for normal murine fetal hepatic hematopoiesis. Cell. 1991;65(4):677–89.View ArticlePubMedGoogle Scholar
- Makani J, Menzel S, Nkya S, Cox SE, Drasar E, Soka D, et al. Genetics of fetal hemoglobin in Tanzanian and British patients with sickle cell anemia. Blood. 2011;117(4):1390–2.View ArticlePubMedGoogle Scholar
- Creary LE, Ulug P, Menzel S, McKenzie CA, Hanchard NA, Taylor V, et al. Genetic variation on chromosome 6 influences F cell levels in healthy individuals of African descent and HbF levels in sickle cell patients. PLoS One. 2009;4(1):e4218.View ArticlePubMedPubMed CentralGoogle Scholar
- Galarneau G, Palmer CD, Sankaran VG, Orkin SH, Hirschhorn JN, Lettre G. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nat Genet. 2010;42(12):1049–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Menzel S, Rooks H, Zelenika D, Mtatiro SN, Gnanakulasekaran A, Drasar E, et al. Global Genetic Architecture of an Erythroid Quantitative Trait Locus, HMIP-2. Ann Hum Genet. 2014;78(6):434–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Farrell JJ, Sherva RM, Chen ZY, Luo HY, Chu BF, Ha SY, et al. A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression. Blood. 2011;117(18):4935–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Muzale HRT, Rugemalira JM. Researching and documenting the languages of Tanzania. Lang Do Conserv. 2008;2:68–108.Google Scholar
- Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science. 2009;324(5930):1035–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296(5576):2225–9.View ArticlePubMedGoogle Scholar
- Jorde LB, Watkins WS, Bamshad MJ. Population genomics: a bridge from evolutionary history to genetic medicine. Hum Mol Genet. 2001;10(20):2199–207.View ArticlePubMedGoogle Scholar
- Makani J, Cox SE, Soka D, Komba AN, Oruo J, Mwamtemi H, et al. Mortality in sickle cell anemia in Africa: a prospective cohort study in Tanzania. PLoS One. 2011;6(2):e14699.View ArticlePubMedPubMed CentralGoogle Scholar
- Mtatiro SN, Singh T, Rooks H, Mgaya J, Mariki H, Soka D, et al. Genome wide association study of fetal hemoglobin in sickle cell anemia in Tanzania. PLoS One. 2014;9(11):e111464.View ArticlePubMedPubMed CentralGoogle Scholar
- Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5(6):e1000529.View ArticlePubMedPubMed CentralGoogle Scholar
- Tekola Ayele F, Hailu E, Finan C, Aseffa A, Davey G, Newport MJ, et al. Prediction of HLA class II alleles using SNPs in an African population. PLoS One. 2012;7(6):e40206.View ArticlePubMedGoogle Scholar
- Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467(7319):1061–73.View ArticlePubMedGoogle Scholar
- Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406.View ArticlePubMedPubMed CentralGoogle Scholar
- Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34(8):816–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Solovieff N, Milton JN, Hartley SW, Sherva R, Sebastiani P, Dworkis DA, et al. Fetal hemoglobin in sickle cell anemia: genome-wide association studies suggest a regulatory region in the 5’ olfactory receptor gene cluster. Blood. 2010;115(9):1815–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Wonkam A, Ngo Bitoungui VJ, Vorster AA, Ramesar R, Cooper RS, Tayo B, et al. Association of variants at BCL11A and HBS1L-MYB with hemoglobin F and hospitalization rates among sickle cell patients in Cameroon. PLoS One. 2014;9(3):e92506.View ArticlePubMedPubMed CentralGoogle Scholar
- Wahlberg K, Jiang J, Rooks H, Jawaid K, Matsuda F, Yamaguchi M, et al. The HBS1L-MYB intergenic interval associated with elevated HbF levels shows characteristics of a distal regulatory region in erythroid cells. Blood. 2009;114(6):1254–62.View ArticlePubMedGoogle Scholar
- Buchanan GR. “Packaging” of fetal hemoglobin in sickle cell anemia. Blood. 2014;123(4):464–5.View ArticlePubMedGoogle Scholar
- So CC, Song YQ, Tsang ST, Tang LF, Chan AY, Ma ES, et al. The HBS1L-MYB intergenic region on chromosome 6q23 is a quantitative trait locus controlling fetal haemoglobin level in carriers of beta-thalassaemia. J Med Genet. 2008;45(11):745–51.View ArticlePubMedGoogle Scholar
- Mtatiro SN, Makani J, Mmbando B, Thein SL, Menzel S, Cox SE. Genetic variants at HbF-modifier loci moderate anemia and leukocytosis in sickle cell disease in Tanzania. Am J Hematol. 2015;90(1):E1–4.View ArticlePubMedGoogle Scholar
- Padmos MA, Roberts GT, Sackey K, Kulozik A, Bail S, Morris JS, et al. Two different forms of homozygous sickle cell disease occur in Saudi Arabia. Br J Haematol. 1991;79(1):93–8.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.