- Research article
- Open Access
- Open Peer Review
Transferability and Fine Mapping of genome-wide associated loci for lipids in African Americans
BMC Medical Geneticsvolume 13, Article number: 88 (2012)
A recent, large genome-wide association study (GWAS) of European ancestry individuals has identified multiple genetic variants influencing serum lipids. Studies of the transferability of these associations to African Americans remain few, an important limitation given interethnic differences in serum lipids and the disproportionate burden of lipid-associated metabolic diseases among African Americans.
We attempted to evaluate the transferability of 95 lipid-associated loci recently identified in European ancestry individuals to 887 non-diabetic, unrelated African Americans from a population-based sample in the Washington, DC area. Additionally, we took advantage of the generally reduced linkage disequilibrium among African ancestry populations in comparison to European ancestry populations to fine-map replicated GWAS signals.
We successfully replicated reported associations for 10 loci (CILP2/SF4, STARD3, LPL, CYP7A1, DOCK7/ANGPTL3, APOE, SORT1, IRS1, CETP, and UBASH3B). Through trans-ethnic fine-mapping, we were able to reduce associated regions around 75% of the loci that replicated.
Between this study and previous work in African Americans, 40 of the 95 loci reported in a large GWAS of European ancestry individuals also influence lipid levels in African Americans. While there is now evidence that the lipid-influencing role of a number of genetic variants is observed in both European and African ancestry populations, the still considerable lack of concordance highlights the importance of continued ancestry-specific studies to elucidate the genetic underpinnings of these traits.
The distribution of serum lipid concentrations has well-established clinical utility as a risk factor for a range of metabolic diseases, including cardiovascular disease (CVD) and type 2 diabetes (T2D). As such, great effort has gone into uncovering the genetic epidemiology of serum lipids, including a recent meta-analysis of 46 genome-wide studies comprising more than 100,000 individuals of European ancestry . This study yielded 95 loci significantly associated with at least one of four serum lipid traits: high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), triglycerides (TG), or total cholesterol (TC). Understanding the genetic underpinnings of risk factors for metabolic disorders is particularly relevant for African Americans, who experience a disproportionate burden of CVD mortality [2, 3] and T2D ; this disparity is projected to continue . While replication of some of the lipid-associated loci identified among European ancestry individuals has been reported for African Americans of the NHLBI’s CARe consortium [1, 6] and the PAGE study , several limitations to these analyses warrant further evaluation. First, 20 of the lead associations in the 95 loci have not yet been investigated in African Americans (primarily due to the lack of availability of TC in these cohorts). Second, for all but a few associations that have been investigated, exact replication (i.e., look-up of only the reported index SNP) was attempted. Given the generally greater linkage disequilibrium (LD) among European-ancestry individuals, it is expected that index SNPs tag larger regions than they would among African ancestry individuals. Therefore, the functional variant tagged by the index SNP in European ancestry individuals might not be in LD with the same SNP in African-ancestry individuals, motivating different analytical strategies (i.e., “local” replication, described below). In the present study, we used more robust analytic strategies to investigate the transferability of reported genetic associations for serum lipids to African Americans. We also exploited these interethnic differences in LD to conduct fine-mapping of the replicated loci.
Ethical approval for this study was obtained from the Howard University Internal Review Board. Written informed consent was provided by all participants. The Howard University Family Study (HUFS) is a population-based study of African Americans in the Washington, DC metropolitan area . Unrelated, non-diabetic participants were included if they were not using lipid-lowering medication. Serum lipids were assayed using fasting blood samples (>8 hours), and concentrations were determined enzymatically on a Cobas Integra 400 Analyzer (Roche Diagnostics, Indianapolis, IN). The intra-assay coefficients of variation (CV) for lipid assays indicate consistent performance (TC, LDL, and HDL, CV <1.5%; TG, CV < 3.0%).
Genome-wide genotyping was performed using the Affymetrix® Genome-Wide Human SNP Array 6.0. Genotype calls were made using Birdseed, version 2 . SNPs were excluded if they had a call rate <95% (n = 41,885), a minor allele frequency <0.01 (n = 19,154) or a Hardy-Weinberg equilibrium (HWE) test p-value <1 × 10-3 (n = 6,317). A total of 808,465 autosomal SNPs passed these filters. The average call rate for this set of SNPs was 99.55%, with concordance of blind duplicates of 99.74%. A check for population stratification was conducted using non-parametric clustering of genotypes as previously described .
Power calculations were performed using QUANTO . When the MAF was at least 0.05, this study was adequately powered (power > 90%) to detect associations of the range observed in the prior publication for all traits except for HDL (power < 50% to detect the minimum effect size observed previously). With a MAF of 0.01, the power remained sufficient across the range of TG, but was low (≤50%) to detect the minimum effect sizes observed for each of the other traits. Imputation was performed using the MACH algorithm as previously described  (with 1000 Genomes reference data, http://www.1000genomes.org/). SNPs were excluded if they had a missingness rate ≥10%, Hardy-Weinberg test p-value <1 × 10-3, or a minor allele frequency <0.01. Imputed SNPs for which no rsid is currently available are described with a “chromosome:position” nomenclature (position refers to NCBI36 build). After quality control filters, 5,396,780 markers were included in the analysis.
Association analysis for the log-transformed lipid variables was performed using PLINK v1.07  under an additive genetic model with adjustment for age, sex, body mass index (BMI), and the first 2 PCs of the genotypes (computed using EIGENSTRAT ). The appropriate number of PCs necessary to adjust for population substructure in HUFS has been previously determined . Replication was attempted using two strategies. First, we investigated the exact SNPs that were previously identified . A SNP was considered replicated if the direction of effect was consistent and the association p-value was ≤0.05 . Second, we looked at all SNPs that were in LD with the reported SNPs in the CEU population, using a search window of ±250 kb from the index SNP with r2 ≥0.30 (“local replication”; for further discussion, see ). P-values obtained in the local replication were corrected for the effective degrees of freedom within an LD block containing the reported SNP , and an adjusted p-value 0.05 was considered statistically significant.
To take advantage of the generally decreased haplotype size in African ancestry populations, fine-mapping of replicated signals was attempted using the following strategies: inspection of regional plots of association to identify SNPs with a stronger signal than the index SNP in HUFS (LocusZoom 1.1, http://csg.sph.umich.edu/locuszoom/) and comparison of haplotype block structure between the CEU and YRI for SNPs of interest (Haploview 4.2, http://www.broadinstitute.org/haploview/haploview). Finally, we examined the other SNPs on the array and the imputed SNPs for any association with a genome-wide significant p < 2.5 × 10-8.
The study sample comprised 887 African Americans (374 men, 513 women), with a mean age of 46 years and a mean BMI of 28 kg/m2 in men and 31.5 kg/m2 in women (Table 1). Of the 95 previously identified lipids-associated index SNPs, 86 were successfully genotyped or imputed in HUFS (Figure 1). After quality control, 51 SNPs were included in the exact replication analysis. We successfully replicated 7 of these 51 previously identified lipids-associated loci: CILP2/SF4, STARD3, LPL, CYP7A1, DOCK7/ANGPTL3, APOE, and SORT1 (Table 2). A comparison of the allele frequencies in those of African and European ancestry is provided for those SNPs that did not replicate (Additional file 1).
We replicated additional SNPs using an LD-based local replication strategy. We identified 569 SNPs that were in LD among the CEU with SNPs at the 88 loci that did not replicate exactly (Figure 1). Of these, 530 were genotyped or imputed in HUFS. After quality control, 389 SNPs representing 62 loci were included in the analysis. An additional 3 loci were replicated: IRS1, CETP, and UBASH3B (Table 3). In total, we were able to evaluate 82 of the 95 reported loci by either exact or local replication, and 10 of these (12%) showed significant association in HUFS.
For many of the 10 loci that were transferable, the generally reduced LD across the genomes of those of African ancestry resulted in finer mapping of the signals observed among European ancestry populations (Figure 1, Table 4). In the case of reported SNP rs12678919 (downstream of LPL), stronger association was observed for LPL intronic SNP rs12679834 (p = 0.001). While these two SNPs are in the same 53 kb haplotype block among the CEU, rs12678919 is not associated with a haplotype block in the YRI and rs12679834 is associated with a much smaller haplotype block (8 kb). This result suggests that the causal SNP is more closely linked with rs12679834 than rs12678919, and dramatically reduces the region for further investigation from 53 kb to 8 kb (Figure 2). Similarly, rs7941030 (UBASH3B) was not associated with TC in HUFS (p = 0.10), but rs6589939, which was in the same 40 kb haplotype block in the CEU, was (p = 0.005). Among the YRI, neither rs6589939 nor rs7941030 was in a haplotype block. In this study sample, an intronic SNP in NSMAF, rs10088541, had a lower p value than index SNP rs2081687 (nearest CYP7A1; p = 7.5 × 10–5 vs. 0.04). While these SNPs are correlated among the CEU (r2 = 0.75), they are not correlated in the HUFS samples (r2 = 0.03): the causal SNP may be more closely linked to rs10088541 (Figure 3). The replication of rs629301 (SORT1) in HUFS significantly reduces the region of interest for this signal. While this SNP is in a 16 kb haplotype block in the CEU, the block is reduced to less than 500 bp among the YRI (Figure 4).
A search for other hits in the full set of genotyped and imputed SNPs showed that no SNPs reached genome-wide significance (Additional files 2, 3, 4, 5). The top SNPs were: chr16:50157331 near the gene HEATR3 (associated with increased TG, p = 5.9 × 10-8), rs711794 near ZAK (associated with decreased LDL, p = 6.3 × 10-8), and rs1047163, a 3’ UTR variant near HS1BP3 (associated with decreased HDL, p = 6.4 × 10-8).
We identified 10 loci that influence lipid levels in this cohort of African Americans. Of these, 7 were identified through testing the reported SNP while an additional 3 loci were identified using an LD-based strategy employed to account for the potential non-transfer of association signals across populations with different ancestral background . Teslovich et al. assessed the generalizability of their findings by attempting replication in ~8,000 African Americans in the CARe consortium . Of the 75 out of 95 loci for which the index SNP-trait association was investigated, 29 successfully replicated (see Supplementary Table 11 of that paper). A subset of these loci, along with replication of other lipid GWAS signals in the CARe African Americans, was also reported by Lettre et al.). The PAGE study, which included ~9,000 African Americans, investigated 9 of 95 loci (all also included in CARe) and replicated 6 . Of note, these were not independent samples, with both CARe and PAGE drawing participants from the ARIC and CARDIA cohorts. Of the 20 associations for which replication had not yet been attempted in an African American cohort, we were able to evaluate 16 in HUFS. One of these, an association between rs7941030 and TC, was replicated in HUFS. Additionally, four other associations that did not replicate in CARe were replicated in HUFS: rs10401969 (CILP2/SF4) with TG, rs2081687 (CYP7A1) with LDL, rs2972146 (IRS1) with HDL, and rs4420638 (APOE) with HDL (this association replicated in PAGE). CARe, PAGE, and HUFS all support the association of two loci with HDL in African Americans: rs3764261 (CETP) and rs4420638 (APOE).
Possible explanations for the lack of transferability of findings include differences in allele frequencies (see Additional file 1) and differences in effect sizes by population. Wide variability between populations in the frequency of risk alleles associated with a range of traits in GWAS has been demonstrated . The correlation of effect sizes between GWAS-identified associations in European compared to African ancestry populations was only 0.27 (p = 0.2) in an evaluation of 24 SNPs with GWAS results for both ancestral groups. In fact, for 79% of the associations investigated, point estimates were in the opposite direction or differed by more than twofold in European vs. African ancestry comparisons . Both of these results favor ancestry-specific analyses.
Some of the loci highlighted in this work have known biological functions relevant to serum lipids. STARD3, associated with HDL, is a lipid-trafficking protein. LPL, associated with TG, is a triglyceride hydrolase and a ligand factor for receptor-mediated lipoprotein uptake; mutations causing LPL deficiency have been implicated in type I hyperlipoproteinemia (NCBI: LPL, 2011). ApoE, associated with HDL, is a main lipoprotein of the chylomicron and is involved in the catabolism of triglyceride-rich lipoprotein constituents; defects in the gene encoding this protein result in familial dysbetalipoproteinemia (NCBI: APOE, 2011). CETP, associated with HDL, plays multiple roles in HDL metabolism and in the reverse cholesterol transport pathway . A CETP SNP (rs247617) that was unlinked with the replicated SNP was one of the top hits for HDL in our discovery GWAS (Additional file 3), suggesting the presence of multiple functional variants at this locus. Based on searches of both the GWAS catalog  and PubMed, only one of the top SNPs from our discovery GWAS had been previously reported: rs247617, a variant 5 KB upstream of CETP, was also associated with HDL among Finns  and African Americans of the CARe consortium , with a consistent direction of effect. This variant appears to be a significant determinant of HDL concentration across ethnicities.
Our study has two main strengths and one main limitation. First, HUFS represents the general population of African Americans in the Washington, DC area. The lack of selection for disease status makes this an optimal study sample for drawing conclusions regarding transferability to a broader population of African Americans. Second, a local replication strategy was employed to evaluate transferability of the reported associations, in recognition of the well-known differences in LD structure across the genome between African and European ancestry individuals. The main limitation of this study is the modest sample size. In some instances, it is probable that the failure to replicate was a result of lack of power. For instance, rs9987289 (PPP1R3B) – HDL, which was replicated in the CARe consortium analysis, was not genotyped or imputed in this sample, but a local SNP, rs6601299 (r2 = 0.86) was associated in the same direction, but just above the significance level (p = 0.07). As the previous publication is a meta-analysis with a very large sample size, it was able to detect small effect sizes, which would be difficult to replicate in a GWAS with a more limited sample size. As a result, more accurate estimates of transferability will await the aggregation of African ancestry GWAS into a suitably large meta-analysis. Of the 10 replicated loci in this study, only 1 had been previously identified in an individual GWAS (rs3764261 and HDL in GWAS of Indian Asian men [25, 26], Finns , and Japanese ).
Overall, this study conducted in African Americans, replicated 10 of the 95 loci that were identified in a large GWAS of lipids in European ancestry populations. Together with results from previous work, there is now support for the transferability of 42% (40/95) of the European ancestry-identified loci to African Americans. Notably, conclusive inferences about the transferability of all of the previous findings are precluded by the limitations in replication attempts conducted in African Americans thus far in terms of relative sample size and coverage of African ancestry genetic diversity by currently available GWAS chips. Further work in African ancestry populations will be necessary to completely evaluate these loci.
Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, Pirruccello JP, Ripatti S, Chasman DI, Willer CJ, et al: Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010, 466 (7307): 707-713. 10.1038/nature09270.
Gu Q, Burt VL, Paulose-Ram R, Yoon S, Gillum RF: High blood pressure and cardiovascular disease mortality risk among U.S. adults: the third National Health and Nutrition Examination Survey mortality follow-up study. Ann Epidemiol. 2008, 18 (4): 302-309. 10.1016/j.annepidem.2007.11.013.
Hurley LP, Dickinson LM, Estacio RO, Steiner JF, Havranek EP: Prediction of cardiovascular death in racial/ethnic minorities using Framingham risk factors. Circ Cardiovasc Qual Outcomes. 2010, 3 (2): 181-187. 10.1161/CIRCOUTCOMES.108.831073.
Cowie CC, Rust KF, Byrd-Holt DD, Eberhardt MS, Flegal KM, Engelgau MM, Saydah SH, Williams DE, Geiss LS, Gregg EW: Prevalence of diabetes and impaired fasting glucose in adults in the U.S. population: National Health And Nutrition Examination Survey 1999–2002. Diabetes Care. 2006, 29 (6): 1263-1268. 10.2337/dc06-0062.
Heidenreich PA, Trogdon JG, Khavjou OA, Butler J, Dracup K, Ezekowitz MD, Finkelstein EA, Hong Y, Johnston SC, Khera A, et al: Forecasting the Future of Cardiovascular Disease in the United States. Circulation. 2011, 123 (8): 933-944. 10.1161/CIR.0b013e31820a55f5.
Lettre G, Palmer CD, Young T, Ejebe KG, Allayee H, Benjamin EJ, Bennett F, Bowden DW, Chakravarti A, Dreisbach A, et al: Genome-Wide Association Study of Coronary Heart Disease and Its Risk Factors in 8,090 African Americans: The NHLBI CARe Project. PLoS Genet. 2011, 7 (2): e1001300-10.1371/journal.pgen.1001300.
Dumitrescu L, Carty CL, Taylor K, Schumacher FR, Hindorff LA, Ambite JL, Anderson G, Best LG, Brown-Gentry K, BÅ¯Å¾kovÃ¡ P, et al: Genetic Determinants of Lipid Traits in Diverse Populations from the Population Architecture using Genomics and Epidemiology (PAGE) Study. PLoS Genet. 2011, 7 (6): e1002138-10.1371/journal.pgen.1002138.
Shriner D, Adeyemo A, Gerry NP, Herbert A, Chen G, Doumatey A, Huang H, Zhou J, Christman MF, Rotimi CN: Transferability and fine-mapping of genome-wide associated loci for adult height across human populations. PLoS One. 2009, 4 (12): e8398-10.1371/journal.pone.0008398.
McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PIW, Maller JB, Kirby A, et al: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet. 2008, 40 (10): 1166-1174. 10.1038/ng.238.
Gao X, Starmer J: AWclust: point-and-click software for non-parametric population structure analysis. BMC Bioinformatics. 2008, 9 (77):
Gauderman W, Morrison J: QUANTO 1.1: A computer program for power and sample size calculations for genetic-epidemiology studies. 2006, http://hydra.usc.edu/gxe,
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al: PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 2007, 81 (3): 559-575. 10.1086/519795.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006, 38 (8): 904-909. 10.1038/ng1847.
Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, Huang H, Zhou J, Lashley K, Chen Y, Christman M, et al: A genome-wide association study of hypertension and blood pressure in African Americans. PLoS Genet. 2009, 5 (7): e1000564-10.1371/journal.pgen.1000564.
Chanock S, Manolio TA, Boehnke M, Boerwinkle E, Hunter D, Thomas G, Hirschhorn J, Abecasis G, Altshuler D, Bailey-Wilson J, et al: Replicating genotype-phenotype associations. Nature. 2007, 447 (7145): 655-660. 10.1038/447655a.
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004, 74 (1): 106-120. 10.1086/381000.
Ramos E, Chen G, Shriner D, Doumatey A, Gerry NP, Herbert A, Huang H, Zhou J, Christman MF, Adeyemo A, et al: Replication of genome-wide association studies (GWAS) loci for fasting plasma glucose in African-Americans. Diabetologia. 2011, 54 (4): 783-788. 10.1007/s00125-010-2002-7.
Pe'er I, Yelensky R, Altshuler D, Daly MJ: Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology. 2008, 32 (4): 381-385. 10.1002/gepi.20303.
Teo YY, Sim X: Patterns of linkage disequilibrium in different populations: implications and opportunities for lipid-associated loci identified from genome-wide association studies. Curr Opin Lipidol. 2010, 21 (2): 104-115. 10.1097/MOL.0b013e3283369e5b.
Adeyemo A, Rotimi C: Genetic variants associated with complex human diseases show wide variation across multiple populations. Public Health Genomics. 2010, 13 (2): 72-79. 10.1159/000218711.
Ntzani EE, Liberopoulos G, Manolio TA, Ioannidis JP: Consistency of genome-wide associations across major ancestral groups. Hum Genet. 2012, 131 (7): 1057-1071. 10.1007/s00439-011-1124-4.
Ridker PM, Pare G, Parker AN, Zee RY, Miletich JP, Chasman DI: Polymorphism in the CETP gene region, HDL cholesterol, and risk of future myocardial infarction: Genomewide analysis among 18 245 initially healthy women from the Women's Genome Health Study. Circ Cardiovasc Genet. 2009, 2 (1): 26-33. 10.1161/CIRCGENETICS.108.817304.
A Catalog of Published Genome-Wide Association Studies. [http://www.genome.gov/26525384]
Kristiansson K, Perola M, Tikkanen E, Kettunen J, Surakka I, Havulinna AS, Stančáková A, Barnes C, Widen E, Kajantie E, et al: Genome-Wide Screen for Metabolic Syndrome Susceptibility Loci Reveals Strong Lipid Gene Contribution But No Evidence for Common Genetic Basis for Clustering of Metabolic Syndrome Traits / CLINICAL PERSPECTIVE. Circulation: Cardiovascular Genetics. 2012, 5 (2): 242-249. 10.1161/CIRCGENETICS.111.961482.
Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, Froguel P, Balding D, Scott J, Kooner JS: Common genetic variation near MC4R is associated with waist circumference and insulin resistance. Nat Genet. 2008, 40 (6): 716-718. 10.1038/ng.156.
Zabaneh D, Balding DJ: A Genome-Wide Association Study of the Metabolic Syndrome in Indian Asian Men. PLoS One. 2010, 5 (8): e11961-10.1371/journal.pone.0011961.
Sabatti C, Service SK, Hartikainen A-L, Pouta A, Ripatti S, Brodsky J, Jones CG, Zaitlen NA, Varilo T, Kaakinen M, et al: Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet. 2009, 41 (1): 35-46. 10.1038/ng.271.
Hiura Y, Shen C-S, Kokubo Y, Okamura T, Morisaki T, Tomoike H, Yoshida T, Sakamoto H, Goto Y, Nonogi H, et al: Identification of Genetic Markers Associated With High-Density Lipoprotein-Cholesterol by Genome-Wide Screening in a Japanese Population: The Suita Study. Circulation Journal. 2009, 73 (6): 1119-1126. 10.1253/circj.CJ-08-1101.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/13/88/prepub
The Howard University Family Study was supported by National Institutes of Health grants S06GM008016-320107 to CNR and S06GM008016-380111 to AA. We thank the participants of the study, for which enrollment was carried out at the Howard University General Clinical Research Center, supported by National Institutes of Health grant 2M01RR010284. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official view of the National Institutes of Health. This research was supported in part by the Intramural Research Program of the Center for Research on Genomics and Global Health (CRGGH). The CRGGH is supported by the National Human Genome Research Institute, the National Institute of Diabetes and Digestive and Kidney Diseases, the Center for Information Technology, and the Office of the Director at the National Institutes of Health (Z01HG200362). Genotyping support was provided by the Coriell Institute for Medical Research.
The authors declare that they have no competing interests.
AA and ARB analyzed the data, prepared figures and tables, and drafted the manuscript. KGM also conducted analyses. APD and HH assayed serum lipids. GC prepared the genome-wide genotype data. GC, DS, and AA conducted the genome-wide imputation of genotypes. AH, NPG, and MFC genotyped the samples. CNR, AA, ARB, and KGM contributed to the conception and design of the study. All authors read, provided important feedback, and approved the final manuscript.
Adebowale Adeyemo, Amy R Bentley contributed equally to this work.